amdu参数详解

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:amdu参数详解

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

最近发现amdu命令比以前认知中的强大,记录下相关参数

[oracle@xifenfei ~]$ amdu help=y
a/usize         AU size for corrupt disks
-ausize <bytes>: This option must be set when -baddisks is set. It
    must be a power of 2. This size is required to scan a disk looking
    for metadata, and it is normally read from the disk header. The
    value applies to all disks that do not have a valid header. The
    value from the disk header will be used if a valid header is
    found.

ba/ddisks               Include disks with bad headers
-baddisks <diskgroup>:  Normally disks with bad disk headers, or that
    look like they were never part of a disk group, will not be
    scanned. This option forces them to be scanned anyway and to be
    considered part of the given diskgroup. This is most useful when
    a disk header has been damaged. The disk will still need to have
    a valid allocation table to drive the scan unless -fullscan is
    used. In any case at least one block in the first two AUs must be
    valid so that the disk number can be determined. The options
    -ausize and -blksize are required since these values are normally
    fetched from the disk header. If the diskgroup uses external
    redundancy then -external should be specified. These values will
    be compared against any valid disks found in the diskgroup and
    they must be the same.

bl/ksize                ASM block size for corrupt disks
-blksize <bytes>: This option must be set when -baddisks is set. It
    must be a power of 2. This size is required to scan a disk looking
    for metadata, and it is normally read from the disk header. The
    value applies to all disks that do not have a valid header. The
    value from the disk header will be used if a valid header is
    found.

c/ompare                Compare file mirrors
-compare: This option only applies to file extraction from a normal or
    high redundancy disk group. Every extent that is mirrored on more
    than one discovered disk will have all sides of its mirror
    compared. If they are not identical a message will be reported
    on standard error and the report file. The message will indicate
    which copy was extracted. A count of the blocks that are not
    identical will be in the report file.

dir/ectory              Directory from previous dump
-directory <string>: This option completely eliminates the discovery
    phase of operation. It specifies the name of a dump directory from
    a previous run of AMDU. The report file and map files are read
    instead of doing a discovery and scan. The parsing of these ASCII
    files is very dependent on them being exactly as written by AMDU.
    AMDU is unlikely to work properly if they have been modified by
    a text editor, or if some of the files are missing or truncated.
    Note that the directory may be a copy FTPed from another
    machine. The other machine may even be a different platform
    with a different endianess.

dis/kstring             Diskstring for discovery
 -diskstring <string>: By default the null string is used for
    discovery. The null string should discover all disks the user has
    access to. Many installations specify an asm_diskstring parameter
    for their ASM instance. If so that parameter value should be given
    here. Multiple discovery strings can be specified by multiple
    occurrences of -diskstring <string>. Beware of shell syntax
    conflicts with discovery strings. Diskstrings are usually the same
    syntax the shell uses for expanding path names on command lines so
    they will most likely need to be enclosed in single quotes.

du/mp           Diskgroups to dump
-dump <diskgroup>: This option specifies the name of a diskgroup to
    have its metadata dumped. This option may be specified multiple
    times to dump multiple diskgroups. If the diskgroup name is ALL
    then all diskgroups encountered will be dumped. The diskgroup name
    is not case sensitive, but will be converted to uppercase for all
    reports. If this option is not specified then no map or image
    files will be created, but -extract and -print may still work.

exc/lude                Disks to exclude
-exclude <string>: Multiple exclude options may be specified. These
    strings are used for discovery just like the values for diskstring.
    Only shallow discovery is done on these diskstrings. Any disks
    found in the exclude discovery will not be accessed. If they are
    also discovered using the -diskstring strings, then the report will
    include the information from shallow discovery along with a message
    indicating the disk was excluded.

exte/rnal               Assume external redundancy
-external: Normally AMDU determines the diskgroup redundancy from the
    disk headers. However this is not possible with the -baddisks
    option. It is assumed that the redundancy of the -baddisks
    diskgroup is normal or high unless this option is given to specify
    external redundancy.

extr/act                Files to extract
-extract <diskgroup>.<file_number>: This extracts the numbered file
    from the named diskgroup, case insensitive. This option may be
    specified multiple times to extract multiple files. The extracted
    file is placed in the dump directory under the name
    <diskgroup>_<number>.f  where <diskgroup> is the diskgroup name
    in uppercase, and <number> is the file number. The -output option
    may be used to write the file to any location. The extracted file
    will appear to have the same contents it would have if accessed
    through the database. If some portion of the file is unavailable
    then that portion of the output file will be filled with
    0xBADFDA7A, and a message will appear on stderr.

fi/ledump               Dump files rather than extract
-filedump: This option causes the file objects in the command line to
    have their blocks dumped to the image files rather than extracted.
    This can be combined with the -novirtual option to selectively
    dump only some of the metadata files. It may also be used to dump
    user files (number >= 256) so that all mirrored copies can be
    examined.

fo/rmer         Include dropped disks
-former: Normally disks marked as former are not scanned, but this
    option will scan them and include their contents in the output.
    This is useful when it is necessary to look at the contents of a
    disk that was dropped. Note that dropped normal disks will not have
    any entries in their allocation tables and thus only the physically
    addressed extents will be dumped. Force dropped disks will not have
    status former in their disk headers and are not affected by this
    option. However if DROP DISKGROUP is used, the disks will have the
    contents as of the time of the drop, and will be in status former.
    Thus this option is useful for extracting files from a dropped
    diskgroup.

fu/llscan               Scan entire disk
-fullscan: This option reads every AU on the disk and looks at the
    contents of the AU rather than limiting the AU's read based on the
    allocation table. This is useful when the allocation table is
    corrupt or needs recovery. An AU will be written to the image file
    if it starts with a block that contains a valid ASM block header.
    The file and extent information for the map will be extracted from
    the block header. Physically addressed metadata will be dumped
    regardless of its contents. This option is incompatible with
    extracting a file. It is an error to specify -extract with this
    option. Note that this option is likely to find old garbage
    metadata in unallocated AU's since there is no means of
    determining what is allocated. Thus there may be many different
    copies of the same block, possibly of different versions.

h/ex            Always print block contents in hex
-hex: This prints the block contents in hex without attempting to print
    them as ASM metadata. This is useful when the block is known to not
    be ASM metadata. It avoids the ASM block header dump and ensures
    the block is not accidentally interpreted as ASM metadata. This
    option requires at least one -print option.

noa/cd          Do not dump ACD
-noacd: This option limits the dumping of the Active Change Directory
    to just the control blocks that contain the checkpoint. There is
    126 MB of ACD per ASM instance (42 MB for external redundancy). It
    is normally of no interest if there has been a clean shutdown or
    no updates for a while. This option avoids dumping a lot of
    unimportant data. The blocks will still be read and checked for
    corruption. The map file will still contain entries for the ACD
    extents, but the block counts will be zero.

nod/ir          Do not create a dump directory
-nodir: No dump directory is created, and no files are created in it.
    The directory name is not written to standard out. The report file
    is written to standard out before any block printouts from any
    -print options.  This option conflicts with -filedump. It is an
    error to specify this and extract a file to the dump directory.

noe/xtract              Do not create extracted file
-noextract: This prevents files from being extracted to an output
    file, but the file will be read and any errors in selecting the
    correct output will be reported. This is most useful in
    combination with the -compare option.

noh/eart                Do not check for heartbeat
-noheart: Normally the heartbeat block will be saved at discovery time
    and checked when the disk is scanned. A sleep is added between
    discovery and scanning to ensure there is time for the heartbeat
    to be written. If the heartbeat block changes then it is most
    likely that the diskgroup containing this disk is mounted by an
    active ASM instance. An error and warning is generated but
    operation proceeds normally. This option suppresses this check
    and avoids the sleep.

noi/mage                Do not create image files
-noimage: No image files will be created n the dump directory. All
    the reads specified by the read options will still be done. The
    map files may be used to find blocks on the disks themselves. In
    the map file, the count of blocks dumped, the image file sequence
    number, and the byte offset in the image file will all always be
    zero (C00000 S0000 B0000000000)

nom/ap          Do not create map or image files
-nomap: No map file is created and no image file is created. The only
    output is the report file. The -noimage option is assumed if this
    is set since an image file without a map is useless. The options
    -noscan and -noread also result in no map or image files, but
    -nomap still reads the metadata to check for I/O errors and corrupt
    blocks.

nop/rint                Do no print block contents
-noprint: This suppresses the printout of the block contents for
    blocks printed with the -print option. It is useful for getting
    just the block reports without a lot of data. This option requires
    at least one -print option.

norea/d         Shallow discovery only
-noread: This eliminates any reading of any disks at all. Only shallow
    discovery will be done. The report will end after the discovery
    section. It is an error to specify this option and specify a file
    to extract or blocks to print. It is an error to specify this
    and -fullscan.

norep/ort               Do not generate a report
-noreport: This suppresses the generation of the report file. It is
    most useful in combination with -nodir and -print to get block
    printouts without a lot of clutter. It is unnecessary to include
    this with -directory since no report is generated then anyway.

nosc/an         Deep discovery only
-noscan: This eliminates any reading of any disks after deep
    discovery. This results in just doing a deep discovery using the
    disksting parameter. The report will end after the discovery
    section. It is an error to specify this option and specify a file
    to extract. It is an error to specify this and -fullscan.

nosu/bdir               Do not create a dump directory
-nosubdir: No dump directory is created, but files are still created.
    The directory name is not written to standard out. The report file
    and any other dump or extract  files are written to the current
    directory or to the directory indicated by -parentdir. This means
    that if multiple AMDU dumps are requested using this option, the
    report file will always correspond to the last dump requested.

nov/irtual              Do not dump virtual metadata
-novirtual: This option eliminates reading of any virtual metadata.
    Only the physically addressed metadata will be read. This
    implicitly eliminates the ACD and extent maps so -noacd and
    -noxmap will be assumed.

nox/map         Do not dump extent maps
-noxmap: This option eliminates reading of the indirect extents
    containing the file extent maps. This is the bulk of the metadata
    in most diskgroups. Even the entries in the map file will be
    eliminated.

o/utput         Files to create for extract
-output <file_name>: This option specifies a different file for
    writing an extracted file. The file will be overwritten if it
    already exists. This option requires that exactly one file is
    extracted via the -extract option.

pa/rent         Parent for dump directory
-parent <path_name>: By default the dump directory is created in the
    current directory, but another directory can be specified using
    this option. The parent directory for the dump directory must
    already exist.

pr/int          Block to print
-print <block_spec>: This option prints one or more blocks to standard
    out. This option may be specified multiple times to print multiple
    <block_spec>s. The printout contains information about how each
    block was read as well as a formatted printout. Multiple blocks
    matching the same <block_spec> may be found when scanning the
    disks. For example there may be multiple disks that have headers
    for the same diskgroup and disk number. If the block is from a
    mirrored file then multiple copies should exist on different disks.
    If multiple copies of the same block have identical contents then
    only one formatted printout of the contents will be generated, but
    a header will be printed for each copy. A <block_spec> may include
    a count of sequential blocks to print. A <block_spec> may specify
    a block either by disk or file.
   <block_spec> ::= <single_block> | <single_block>.C<count>
   <single_block> ::= <report_disk_block> | <group_disk_block> |
        <extent_file_block> | <virtual_file_block> | <xmap_file_block>
   <report_disk_block> ::=
        <group_name>.N<report_number>.A<au_number>.B<block_number>
   <group_disk_block> ::=
         <group_name>.D<disk_number>.A<au_number>.B<block_number>
   <extent_file_block> ::=
         <group_name>.F<file_number>.X<physical_extent>.B<block_number>
   <virtual_file_block> ::= 
         <group_name>.F<file_number>.V<virtual_block_number>
   <xmap_file_block> ::=
         <group_name>.F<file_number>.M<extent_map_block_number>

r/egistry               Dump registry files
-registry: The ASM registries will be read and dumped to the image
    file. There will be no block consistency checks since these files
    do not have ASM cache headers. To dump one specific registry
    specify -filedump and include the file object for the registry
    (e.g. DATA.255).

s/pfile         Extract usable spfile
-spfile: This causes extract to render the resulting file in a form   
    that is directly usable by startup. Without this option, AMDU   
    will extract the file as a regular ASM file including all ASM   
    specific headers and such

asm磁盘加入vg恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:asm磁盘加入vg恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

又一个客户把asm disk做成pv,加到vg中,并且对lv进行了扩展(ext4的文件系统)
asm-disk-pv


这个客户做了上述操作之后,没有对lv进行写入其他数据,所以破坏较少(主要的破坏就是ext4的每个一段就会置空一部分block预留给文件系统写入元数据使用),通过winhex查看被破坏磁盘发现lvm信息
lvm

对于这种情况,通过对文件头进行修复,结合工具直接拷贝出来数据文件(个别文件元数据损坏通过基于block的方式恢复dbf)
asm-dbf

然后直接恢复dbf中数据文件(对于异常的主要是segment header被置空的tab使用dul单独扫描处理),实现客户数据的最大限度恢复
以前类似文章:
asm disk被加入vg恢复
asm disk被分区,格式化为ext4恢复
pvcreate asm disk导致asm磁盘组异常恢复
再一起asm disk被格式化成ext3文件系统故障恢复
一次完美的asm disk被格式化ntfs恢复
oracle asm disk格式化恢复—格式化为ext4文件系统

误删除asm disk导致磁盘组无法mount数据库恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:误删除asm disk导致磁盘组无法mount数据库恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

客户误删除asm disk两个lun(由于这个是这个存储的特殊性,删除lun之后,存储层面无法恢复出来对应的lun数据,导致客户彻底放弃了硬件层面恢复的可能性.),导致asm磁盘组无法正常mount

SQL> ALTER DISKGROUP DATA MOUNT  /* asm agent *//* {1:27928:40938} */ 
NOTE: cache registered group DATA number=3 incarn=0x60fa38b1
NOTE: cache began mount (first) of group DATA number=3 incarn=0x60fa38b1
NOTE: Assigning number (3,0) to disk (/dev/rdisk/VD02_DBF)
NOTE: Assigning number (3,1) to disk (/dev/rdisk/VD03_DBF)
NOTE: Assigning number (3,2) to disk (/dev/rdisk/VD04_DBF)
NOTE: Assigning number (3,3) to disk (/dev/rdisk/VD05_DBF)
NOTE: Assigning number (3,4) to disk (/dev/rdisk/VD06_DBF)
Thu Dec 29 10:21:20 2022
NOTE: GMON heartbeating for grp 3
GMON querying group 3 at 29 for pid 29, osid 3770
NOTE: Assigning number (3,5) to disk ()
NOTE: Assigning number (3,6) to disk ()
GMON querying group 3 at 30 for pid 29, osid 3770
NOTE: cache dismounting (clean) group 3/0x60FA38B1 (DATA) 
NOTE: messaging CKPT to quiesce pins Unix process pid: 3770, image: oracle@db1 (TNS V1-V3)
NOTE: dbwr not being msg'd to dismount
NOTE: lgwr not being msg'd to dismount
NOTE: cache dismounted group 3/0x60FA38B1 (DATA) 
NOTE: cache ending mount (fail) of group DATA number=3 incarn=0x60fa38b1
NOTE: cache deleting context for group DATA 3/0x60fa38b1
GMON dismounting group 3 at 31 for pid 29, osid 3770
NOTE: Disk DATA_0000 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0001 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0002 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0003 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0004 in mode 0x7f marked for de-assignment
NOTE: Disk  in mode 0x7f marked for de-assignment
NOTE: Disk  in mode 0x7f marked for de-assignment
ERROR: diskgroup DATA was not mounted
ORA-15032: not all alterations performed
ORA-15040: diskgroup is incomplete
ORA-15042: ASM disk "6" is missing from group number "3" 
ORA-15042: ASM disk "5" is missing from group number "3" 
ERROR: ALTER DISKGROUP DATA MOUNT  /* asm agent *//* {1:27928:40938} */

这个客户应该有三个磁盘组存放数据文件,其中data磁盘组的7个磁盘被删除了2个lun,导致data磁盘组无法mount,客户希望尽可能恢复其中数据,对于这种情况,由于2个lun完全丢失,直接通过dul之类的工具拷贝asm数据文件恢复不可行(因为很多asm的元数据也会在丢失的lun里面,导致拷贝出来的数据文件异常太多,恢复效果会很差),对于这种情况采用asm disk header 彻底损坏恢复的恢复方法,尽可能的从block层面恢复出来所有可以恢复的数据块中的数据
20230209100350


由于这个其中涉及了system表空间(oracle损坏严重),结合客户几年前的一个system历史备份文件,恢复出来字典,然后尽可能的恢复数据文件,最终最大限度给客户恢复数据,让客户的损失降到最低.

asm disk被分区,格式化为ext4恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:asm disk被分区,格式化为ext4恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有客户因为没有认识到linux中的磁盘被asm使用,对其进行分区并且做成了ext4的文件系统,从history中获取客户操作命令

  600  fdisk -l
  601  fdisk /dev/sdb
  602  mkfs ext4 /dev/sdb1 
  603  fdisk -l
  604  mkfs -t ext4 /dev/sdb1 
  605  cd /
  606  mkdir u01
  607  mount /dev/sdb1 /u01
  608  df -h

确认磁盘情况,确认sdb直接被asm磁盘使用(asmdisk1)

[grid@racdb3 trace]$ ls -l /dev/asm*
brw-rw---- 1 grid asmadmin 8, 16 Sep 30 14:34 /dev/asmdisk1
[grid@racdb3 trace]$ ls -l /dev/sd*
brw-rw---- 1 root disk 8,  0 Jul 27  2021 /dev/sda
brw-rw---- 1 root disk 8,  1 Jul 27  2021 /dev/sda1
brw-rw---- 1 root disk 8,  2 Jul 27  2021 /dev/sda2
brw-rw---- 1 root disk 8, 16 Sep 30 11:23 /dev/sdb
brw-rw---- 1 root disk 8, 17 Sep 30 11:23 /dev/sdb1
brw-rw---- 1 root disk 8, 32 Jul 27  2021 /dev/sdc

asm日志报错

Fri Sep 30 11:31:41 2022
NOTE: SMON starting instance recovery for group DATA domain 1 (mounted)
NOTE: SMON skipping disk 0 - no header
NOTE: cache initiating offline of disk 0 group DATA
NOTE: process _smon_+asm3 (2989) initiating offline of disk 0.3915953109 (DATA_0000) with mask 0x7e in group 1
NOTE: initiating PST update: grp = 1, dsk = 0/0xe968b3d5, mask = 0x6a, op = clear
Fri Sep 30 11:31:41 2022
GMON updating disk modes for group 1 at 4 for pid 17, osid 2989
ERROR: Disk 0 cannot be offlined, since diskgroup has external redundancy.
ERROR: too many offline disks in PST (grp 1)
Fri Sep 30 11:31:41 2022
NOTE: cache dismounting (not clean) group 1/0x34F84324 (DATA) 
WARNING: Offline for disk DATA_0000 in mode 0x7f failed.
Fri Sep 30 11:31:41 2022
NOTE: halting all I/Os to diskgroup 1 (DATA)
ERROR: No disks with F1X0 found on disk group DATA
NOTE: aborting instance recovery of domain 1 due to diskgroup dismount
NOTE: SMON skipping lock domain (1) validation because diskgroup being dismounted

数据库日志报错

Fri Sep 30 11:31:44 2022
Errors in file /oracle/app/oracle/diag/rdbms/xifenfei/xifenfei3/trace/xifenfei3_lmon_26356.trc:
ORA-00202: control file: '+DATA/xifenfei/controlfile/current.256.968794097'
ORA-15078: ASM diskgroup was forcibly dismounted
Fri Sep 30 11:31:45 2022
Errors in file /oracle/app/oracle/diag/rdbms/xifenfei/xifenfei3/trace/xifenfei3_ckpt_26388.trc:
ORA-00206: error in writing (block 5, # blocks 1) of control file
ORA-00202: control file: '+DATA/xifenfei/controlfile/current.257.968794097'
ORA-15078: ASM diskgroup was forcibly dismounted
ORA-15078: ASM diskgroup was forcibly dismounted
ORA-00206: error in writing (block 5, # blocks 1) of control file
ORA-00202: control file: '+DATA/xifenfei/controlfile/current.256.968794097'
ORA-15078: ASM diskgroup was forcibly dismounted
ORA-15078: ASM diskgroup was forcibly dismounted
Errors in file /oracle/app/oracle/diag/rdbms/xifenfei/xifenfei3/trace/xifenfei3_ckpt_26388.trc:
ORA-00221: error on write to control file
ORA-00206: error in writing (block 5, # blocks 1) of control file
ORA-00202: control file: '+DATA/xifenfei/controlfile/current.257.968794097'
ORA-15078: ASM diskgroup was forcibly dismounted
ORA-15078: ASM diskgroup was forcibly dismounted
ORA-00206: error in writing (block 5, # blocks 1) of control file
ORA-00202: control file: '+DATA/xifenfei/controlfile/current.256.968794097'
ORA-15078: ASM diskgroup was forcibly dismounted
ORA-15078: ASM diskgroup was forcibly dismounted
CKPT (ospid: 26388): terminating the instance due to error 221

通过kfed 查看asm disk被破坏情况

[root@racdb3 scsi_host]#  kfed read /dev/asmdisk1 
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:                       0 ; 0x008: file=0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
7F4FAAD45400 00000000 00000000 00000000 00000000  [................]
        Repeat 26 times
7F4FAAD455B0 00000000 00000000 45C222C8 01000000  [.........".E....]
7F4FAAD455C0 FE830001 003FFFFF E9D60000 0000FFFF  [......?.........]
7F4FAAD455D0 00000000 00000000 00000000 00000000  [................]
        Repeat 1 times
7F4FAAD455F0 00000000 00000000 00000000 AA550000  [..............U.]
7F4FAAD45600 00000000 00000000 00000000 00000000  [................]
  Repeat 223 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]

[root@racdb3 scsi_host]#  kfed read /dev/asmdisk1  aun=2
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:                       0 ; 0x008: file=0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
7F64E77A0400 00000000 00000000 00000000 00000000  [................]
        Repeat 223 times
7F64E77A1200 000081F9 000181F9 000281F9 000381F9  [................]
7F64E77A1210 000481F9 000C81F9 000D81F9 001881F9  [................]
7F64E77A1220 002881F9 003E81F9 007981F9 00AB81F9  [..(...>...y.....]
7F64E77A1230 013881F9 016C81F9 044581F9 04B081F9  [..8...l...E.....]
7F64E77A1240 061A81F9 0CD081F9 1E8481F9 00000000  [................]
7F64E77A1250 00000000 00000000 00000000 00000000  [................]
  Repeat 26 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]

[root@racdb3 scsi_host]#  kfed read /dev/asmdisk1  aun=3
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:                       0 ; 0x008: file=0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
7F8D101FF400 00000000 00000000 00000000 00000000  [................]
        Repeat 223 times
7F8D10200200 000082F9 000182F9 000282F9 000382F9  [................]
7F8D10200210 000482F9 000C82F9 000D82F9 001882F9  [................]
7F8D10200220 002882F9 003E82F9 007982F9 00AB82F9  [..(...>...y.....]
7F8D10200230 013882F9 016C82F9 044582F9 04B082F9  [..8...l...E.....]
7F8D10200240 061A82F9 0CD082F9 1E8482F9 00000000  [................]
7F8D10200250 00000000 00000000 00000000 00000000  [................]
  Repeat 26 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]

[root@racdb3 scsi_host]#  kfed read /dev/asmdisk1  aun=4
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:                       0 ; 0x008: file=0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
7F142949C400 00000000 00000000 00000000 00000000  [................]
        Repeat 223 times
7F142949D200 000083F9 000183F9 000283F9 000383F9  [................]
7F142949D210 000483F9 000C83F9 000D83F9 001883F9  [................]
7F142949D220 002883F9 003E83F9 007983F9 00AB83F9  [..(...>...y.....]
7F142949D230 013883F9 016C83F9 044583F9 04B083F9  [..8...l...E.....]
7F142949D240 061A83F9 0CD083F9 1E8483F9 00000000  [................]
7F142949D250 00000000 00000000 00000000 00000000  [................]
  Repeat 26 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]

[root@racdb3 scsi_host]#  kfed read /dev/asmdisk1  aun=5
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:                       0 ; 0x008: file=0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
7F0615CF6400 00000000 00000000 00000000 00000000  [................]
  Repeat 255 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]

磁盘前几个au被破坏严重.而且相关的备份block都已经损坏,基于这种情况,直接参考:
asm磁盘dd破坏恢复
asm disk header 彻底损坏恢复
asm disk 磁盘部分被清空恢复
通过底层恢复出来相关数据文件,并检测正常
20221002105544


进一步通过au分配列表获恢复redo,ctl等文件

H:\TEMP\asm-ext4\other>dir 
 驱动器 H 中的卷是 SSD-SX
 卷的序列号是 84EB-F434

 H:\TEMP\asm-ext4\other 的目录

2022-09-30  21:52        25,165,824 256.dd
2022-09-30  21:52        25,165,824 257.dd
2022-09-30  23:52        52,429,312 258.dd.1
2022-09-30  23:54        52,429,312 259.dd.1
2022-09-30  23:55        52,429,312 260.dd.1
2022-09-30  23:55        52,429,312 261.dd.1
2022-09-30  23:56        52,429,312 270.dd.1
2022-09-30  23:57        52,429,312 271.dd.1
2022-09-30  23:57        52,429,312 272.dd.1
2022-09-30  23:57        52,429,312 273.dd.1
2022-09-30  23:58        52,429,312 274.dd.1
2022-10-01  00:01        52,429,312 275.dd.1
2022-10-01  00:00        52,429,312 276.dd.1
2022-10-01  00:00        52,429,312 277.dd.1
2022-10-01  00:00        52,429,312 278.dd.1
2022-09-30  23:59        52,429,312 279.dd.1
2022-09-30  23:59        52,429,312 280.dd.1
2022-09-30  23:59        52,429,312 281.dd.1

在另外的新机器上尝试恢复库

[oracle@xifenfei ~]$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.4.0 Production on Sat Oct 1 10:18:58 2022

Copyright (c) 1982, 2013, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup mount pfile='/tmp/pfile'
ORACLE instance started.

Total System Global Area 1519898624 bytes
Fixed Size                  2253464 bytes
Variable Size             939527528 bytes
Database Buffers          570425344 bytes
Redo Buffers                7692288 bytes
ORA-00227: corrupt block detected in control file: (block 8, # blocks 1)
ORA-00202: control file: '/oradata/256.dd'

控制文件损坏,重建ctl

SQL> CREATE CONTROLFILE REUSE DATABASE "xifenfei" NORESETLOGS  NOARCHIVELOG  
  2      MAXLOGFILES 50  
  3      MAXLOGMEMBERS 5  
  4      MAXDATAFILES 100  
  5      MAXINSTANCES 8  
  6      MAXLOGHISTORY 226  
  7  LOGFILE  
  8  group 7  '/oradata/270.dd.1' size 50M,
  9  group 8  '/oradata/272.dd.1' size 50M,
 10  group 5  '/oradata/274.dd.1' size 50M,
 11  group 6  '/oradata/276.dd.1' size 50M,
 12  group 3  '/oradata/278.dd.1' size 50M,
 13  group 4  '/oradata/280.dd.1' size 50M,
 14  group 1  '/oradata/258.dd.1' size 50M,
 15  group 2  '/oradata/260.dd.1' size 50M
 16  DATAFILE  
 17  '/oradata/1',
 18  '/oradata/2',
 19  '/oradata/3',
 20  '/oradata/4',
 21  '/oradata/5',
 22  '/oradata/6',
 23  '/oradata/7',
 24  '/oradata/8',
 25  '/oradata/9',
 26  '/oradata/10',
 27  '/oradata/11'
 28  CHARACTER SET ZHS16GBK  
 29  ;  

Control file created.

尝试open库,报ORA-600 kqfidps_update_stats:2,ORA-600 4194等错误

SQL> recover database;
Media recovery complete.
SQL> alter database open ;
alter database open 
*
ERROR at line 1:
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [kqfidps_update_stats:2],
[0x7FFCCBEB3EC0], [], [], [], [], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [4193], [19319], [l.ok

解决该异常,open数据库成功

SQL> startup mount pfile='/tmp/pfile';
ORACLE instance started.

Total System Global Area 1519898624 bytes
Fixed Size                  2253464 bytes
Variable Size             939527528 bytes
Database Buffers          570425344 bytes
Redo Buffers                7692288 bytes
Database mounted.
SQL> alter database open;

Database altered.

导出数据库,遭遇个别表如下ORA-08103和ORA-01555两种错误,这种是由于个别block在做成文件系统的时候被损坏,底层恢复的时候block被置空导致,对其异常表进行单独处理即可

. . 正在导出表                           ALBUM
EXP-00056: 遇到 ORACLE 错误 8103
ORA-08103: 对象不再存在

. . 正在导出表                  M_PUSH_CONTENT
EXP-00056: 遇到 ORACLE 错误 1555
ORA-01555: 快照过旧: 回退段号  (名称为 "") 过小
ORA-22924: 快照太旧

通过上述操作,实现客户数据的恢复,最大限度挽回客户损坏,再次提醒对于asm disk进行了误操作,建议第一时间保护现场(不要有任何的写入操作,可以最大限度恢复数据)

ORA-15335 ORA-15130 ORA-15066 ORA-15196

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:ORA-15335 ORA-15130 ORA-15066 ORA-15196

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

客户反馈,数据库无法正常启动,通过分析asm的alert日志发现,data磁盘组mount成功之后,没有一会儿自动dismount掉

Mon Sep 26 16:40:14 2022
SQL> /* ASMCMD */ALTER DISKGROUP data MOUNT  
NOTE: cache registered group DATA number=2 incarn=0x9dfa705f
NOTE: cache began mount (first) of group DATA number=2 incarn=0x9dfa705f
NOTE: Assigning number (2,1) to disk (/dev/oracleasm/disks/DATA02)
NOTE: Assigning number (2,0) to disk (/dev/oracleasm/disks/DATA01)
Mon Sep 26 16:40:20 2022
NOTE: GMON heartbeating for grp 2
GMON querying group 2 at 68 for pid 25, osid 14650
NOTE: cache opening disk 0 of grp 2: DATA_0000 path:/dev/oracleasm/disks/DATA01
NOTE: F1X0 found on disk 0 au 2 fcn 0.0
NOTE: cache opening disk 1 of grp 2: DATA_0001 path:/dev/oracleasm/disks/DATA02
NOTE: cache mounting (first) external redundancy group 2/0x9DFA705F (DATA)
Mon Sep 26 16:40:20 2022
* allocate domain 2, invalid = TRUE 
kjbdomatt send to inst 2
Mon Sep 26 16:40:20 2022
NOTE: attached to recovery domain 2
NOTE: cache recovered group 2 to fcn 0.321845
NOTE: redo buffer size is 256 blocks (1053184 bytes)
Mon Sep 26 16:40:20 2022
NOTE: LGWR attempting to mount thread 1 for diskgroup 2 (DATA)
NOTE: LGWR found thread 1 closed at ABA 20.3546
NOTE: LGWR mounted thread 1 for diskgroup 2 (DATA)
NOTE: LGWR opening thread 1 at fcn 0.321845 ABA 21.3547
NOTE: cache mounting group 2/0x9DFA705F (DATA) succeeded
NOTE: cache ending mount (success) of group DATA number=2 incarn=0x9dfa705f
Mon Sep 26 16:40:20 2022
NOTE: Instance updated compatible.asm to 11.2.0.0.0 for grp 2
SUCCESS: diskgroup DATA was mounted
SUCCESS: /* ASMCMD */ALTER DISKGROUP data MOUNT 
Mon Sep 26 16:40:22 2022
WARNING: failed to online diskgroup resource ora.DATA.dg (unable to communicate with CRSD/OHASD)
Mon Sep 26 16:40:47 2022
NOTE: client xff1:xff registered, osid 14742, mbr 0x0
Mon Sep 26 16:40:57 2022
WARNING: cache read  a corrupt block: group=2(DATA) dsk=1 blk=257 disk=1 (DATA_0001) 
incarn=3916071178 au=113792 blk=1 count=1
Errors in file /opt/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_14778.trc:
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483649] [257] [0 != 1]
NOTE: a corrupted block from group DATA was dumped to /opt/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_14778.trc
WARNING: cache read (retry) a corrupt block: group=2(DATA) dsk=1 blk=257 
disk=1 (DATA_0001) incarn=3916071178 au=113792 blk=1 count=1
Errors in file /opt/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_14778.trc:
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483649] [257] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483649] [257] [0 != 1]
ERROR: cache failed to read group=2(DATA) dsk=1 blk=257 from disk(s): 1(DATA_0001)
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483649] [257] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483649] [257] [0 != 1]
NOTE: cache initiating offline of disk 1 group DATA
NOTE: process _user14778_+asm1 (14778) initiating offline of 
disk 1.3916071178 (DATA_0001) with mask 0x7e in group 2
NOTE: initiating PST update: grp = 2, dsk = 1/0xe96a810a, mask = 0x6a, op = clear
Mon Sep 26 16:40:58 2022
GMON updating disk modes for group 2 at 70 for pid 28, osid 14778
ERROR: Disk 1 cannot be offlined, since diskgroup has external redundancy.
ERROR: too many offline disks in PST (grp 2)
Mon Sep 26 16:40:58 2022
NOTE: cache dismounting (not clean) group 2/0x9DFA705F (DATA) 
WARNING: Offline for disk DATA_0001 in mode 0x7f failed.
NOTE: messaging CKPT to quiesce pins Unix process pid: 14782, image: oracle@oracle11grac1 (B000)
Mon Sep 26 16:40:58 2022
NOTE: halting all I/Os to diskgroup 2 (DATA)
Errors in file /opt/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_14778.trc  (incident=144548):
ORA-15335: ASM metadata corruption detected in disk group 'DATA'
ORA-15130: diskgroup "DATA" is being dismounted
ORA-15066: offlining disk "DATA_0001" in group "DATA" may result in a data loss
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483649] [257] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483649] [257] [0 != 1]
Incident details in: /opt/grid/diag/asm/+asm/+ASM1/incident/incdir_144548/+ASM1_ora_14778_i144548.trc
Mon Sep 26 16:40:58 2022
Sweep [inc][144548]: completed
System State dumped to trace file /opt/grid/diag/asm/+asm/+ASM1/incident/incdir_144548/+ASM1_ora_14778_i144548.trc
Mon Sep 26 16:40:58 2022
NOTE: AMDU dump of disk group DATA created at /opt/grid/diag/asm/+asm/+ASM1/incident/incdir_144548
Mon Sep 26 16:41:00 2022
NOTE: LGWR doing non-clean dismount of group 2 (DATA)
NOTE: LGWR sync ABA=21.3550 last written ABA 21.3550
Mon Sep 26 16:41:00 2022
Sweep [inc2][144548]: completed
Mon Sep 26 16:41:00 2022
ERROR: ORA-15130 in COD recovery for diskgroup 2/0x9dfa705f (DATA)
ERROR: ORA-15130 thrown in RBAL for group number 2
Errors in file /opt/grid/diag/asm/+asm/+ASM1/trace/+ASM1_rbal_5162.trc:
ORA-15130: diskgroup "DATA" is being dismounted

这里看主要是由于asm 磁盘组需要做COD recovery导致无法正常稳定的mount,主要原因是遭遇到asm disk的逻辑坏块(存储物理上看是ok的,但是实际数据在asm中看是异常的)

数据库alert日志报错

Mon Sep 26 16:40:52 2022
Successful mount of redo thread 1, with mount id 1097279951
Database mounted in Shared Mode (CLUSTER_DATABASE=TRUE)
Lost write protection disabled
Completed: alter database mount
alter database open
This instance was first to open
Picked broadcast on commit scheme to generate SCNs
LGWR: STARTING ARCH PROCESSES
Mon Sep 26 16:40:56 2022
ARC0 started with pid=40, OS id=14761 
ARC0: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
ARC0: STARTING ARCH PROCESSES
Mon Sep 26 16:40:57 2022
ARC1 started with pid=41, OS id=14764 
Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_lgwr_14479.trc:
ORA-00313: ??????? 1 (???? 1) ???
Mon Sep 26 16:40:57 2022
ARC2 started with pid=42, OS id=14766 
Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_lgwr_14479.trc:
ORA-00313: ??????? 2 (???? 1) ???
Mon Sep 26 16:40:57 2022
Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_ora_14732.trc:
ORA-00313: open failed for members of log group 1 of thread 1
Mon Sep 26 16:40:57 2022
ARC3 started with pid=44, OS id=14770 
ARC1: Archival started
ARC2: Archival started
ARC1: Becoming the 'no FAL' ARCH
ARC1: Becoming the 'no SRL' ARCH
ARC2: Becoming the heartbeat ARCH
Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_ora_14732.trc:
ORA-00313: open failed for members of log group 1 of thread 1
Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_arc2_14766.trc:
ORA-00313: 无法打开日志组 1 (用于线程 1) 的成员
Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_arc1_14764.trc:
ORA-00313: 无法打开日志组 1 (用于线程 1) 的成员
Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_ora_14732.trc  (incident=180281):
ORA-15335: ASM metadata corruption detected in disk group 'DATA'
ORA-15130: diskgroup "DATA" is being dismounted
ORA-15066: offlining disk "DATA_0001" in group "DATA" may result in a data loss
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483649] [257] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483649] [257] [0 != 1]
ARC3: Archival started
ARC0: STARTING ARCH PROCESSES COMPLETE
Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_arc0_14761.trc:
ORA-00313: 无法打开日志组 1 (用于线程 1) 的成员
ORA-00312: 联机日志 1 线程 1: '+DATA/xff/onlinelog/group_1.271.1025610215'
ORA-17503: ksfdopn: 2 未能打开文件 +DATA/xff/onlinelog/group_1.271.1025610215
ORA-15130: diskgroup "DATA" is being dismounted
Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_arc3_14770.trc:
ORA-00313: 无法打开日志组 1 (用于线程 1) 的成员
ORA-00312: 联机日志 1 线程 1: '+DATA/xff/onlinelog/group_1.271.1025610215'
ORA-17503: ksfdopn: 2 未能打开文件 +DATA/xff/onlinelog/group_1.271.1025610215
ORA-15130: diskgroup "DATA" is being dismounted
Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_arc0_14761.trc:
ORA-00313: 无法打开日志组 1 (用于线程 1) 的成员
ORA-00312: 联机日志 1 线程 1: '+DATA/xff/onlinelog/group_1.271.1025610215'
ORA-17503: ksfdopn: 2 未能打开文件 +DATA/xff/onlinelog/group_1.271.1025610215
ORA-15130: diskgroup "DATA" is being dismounted
Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_arc3_14770.trc:
ORA-00313: 无法打开日志组 1 (用于线程 1) 的成员
ORA-00312: 联机日志 1 线程 1: '+DATA/xff/onlinelog/group_1.271.1025610215'
ORA-17503: ksfdopn: 2 未能打开文件 +DATA/xff/onlinelog/group_1.271.1025610215
ORA-15130: diskgroup "DATA" is being dismounted
Unable to create archive log file '+DATA'
Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_ora_14732.trc:
ORA-19816: WARNING: Files may exist in db_recovery_file_dest that are not known to database.
ORA-17502: ksfdcre:4 Failed to create file +DATA
ORA-15335: ASM metadata corruption detected in disk group 'DATA'
ORA-15130: diskgroup "DATA" is being dismounted
ORA-15066: offlining disk "DATA_0001" in group "DATA" may result in a data loss
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483649] [257] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483649] [257] [0 != 1]
*************************************************************
WARNING: A file of type ARCHIVED LOG may exist in
db_recovery_file_dest that is not known to the database.
Use the RMAN command CATALOG RECOVERY AREA to re-catalog
any such files. If files cannot be cataloged, then manually
delete them using OS command. This is most likely the
result of a crash during file creation.
*************************************************************
ARCH: Error 19504 Creating archive log file to '+DATA'
NOTE: Deferred communication with ASM instance
Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_ora_14732.trc:
ORA-15130: diskgroup "DATA" is being dismounted
NOTE: deferred map free for map id 23
Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_ora_14732.trc:
ORA-16038: log 1 sequence# 14235 cannot be archived
ORA-19504: failed to create file ""
ORA-00312: online log 1 thread 1: '+DATA/xff/onlinelog/group_1.271.1025610215'
ORA-00312: online log 1 thread 1: '+ARCH/xff/onlinelog/group_1.279.1025610217'
Mon Sep 26 16:40:58 2022
Sweep [inc][180281]: completed
Sweep [inc2][180281]: completed
USER (ospid: 14732): terminating the instance due to error 16038
Mon Sep 26 16:40:59 2022
System state dump requested by (instance=1, osid=14732), summary=[abnormal instance termination].
Instance terminated by USER, pid = 14732

对于这类故障处理相对比较容易,通过patch asm,让data磁盘组稳定mount,然后open库,迁移数据,实现数据0丢失,完美恢复