Monday, November 16, 2015

ASM REDUNDANCY and Mirroring

The type of a ASM disk group is based on three redundancy levels:
  • Normal
ASM provides two-way mirroring by default. A loss of one ASM disk is tolerated. Use can optionally choose three-way or unprotected mirroring for a file in a NORMAL redundancy disk group. A file specified with HIGH redundancy (three-way mirroring) in a NORMAL redundancy disk group provides additional protection form a bad disk sector, not protection from a disk failure.
  • High
ASM provides triple mirroring by default. A loss of two ASM disks in different failure groups is tolerated.
  • External
ASM does not provide mirroring redundancy and relies on the storage system to provide RAID functionality. Any write error causes a forced dismount of the disk group. All disks must be located to successfully mount the disk group.

Failure Group

When ASM allocates an extent for a mirrored file it allocate a primary copy and a mirror copy. ASM chooses the disk on which to store the mirror copy in a different failure group from the primary copy.
A failure group is a subset of the disks in a disk group, which could fail at the same time because they share hardware. The failure of common hardware must be tolerated. The simultaneous failure of all disks in a failure group does not result data loss because all mirrored copies of the disks are in different failure groups. A NORMAL redundancy disk group must contain at least two failure groups. A HIGH redundancy disk group must contain at least three failure group. There are always failure groups even if they are not explicitly created. If you do not specify a failure group for a disk, Oracle automatically creates a new failure group containing just that disk, except for disk group containing disks on Oracle Exadata cells.

Disk Failure

When there is a failure of one or more disks, the disks are first taken offline and then automatically dropped. In this case the disk group remains mounted and serviceable. In addition because of mirroring all of the disk group data remain accessible. After the disk drop operation, ASM performs a re-balance to restore full redundancy for the data on the failed disks.

Recovery from Read or Write I/O Errors

When a read error happens it triggers Oracle ASM instance to attempt bad block remapping. ASM then reads a good copy of the extent and copies it to the disk that has the read error. If the write to the same location succeeds then the underlining allocation unit is deemed healthy. If the write fails, ASM attempts to write the extent to a new allocate unit on the same disk. If this write succeeds, the original allocation unit is marked as unusable. If the write fails the disk is taken offline.
One unique benefit on Oracle ASM based mirroring is that the database instance is aware of the mirroring. For many types of logical corruptions such as a bad checksum or incorrect System Change Number (SCN), the database instance proceeds through the mirror side looking for valid content and proceeds without errors.
When a write error happens, the database instance sends ASM instance a disk offline message. If database can successfully complete a write to at least one extent copy and receive acknowledgment of the offline disk from ASM , the write is consider successful. If the write to all mirror side fails, database takes the appropriate actions  in response to a write error such as taking the tablespace offline.

No comments:

Post a Comment