Physical devices in a RAID LV can fail or be lost for multiple
reasons. A device could be disconnected, permanently failed, or
temporarily disconnected. The purpose of RAID LVs (levels 1 and
higher) is to continue operating in a degraded mode, without
losing LV data, even after a device fails. The number of devices
that can fail without the loss of LV data depends on the RAID
level:
• RAID0 (striped) LVs cannot tolerate losing any devices.
LV data will be lost if any devices fail.
• RAID1 LVs can tolerate losing all but one device without
LV data loss.
• RAID4 and RAID5 LVs can tolerate losing one device without
LV data loss.
• RAID6 LVs can tolerate losing two devices without LV data
loss.
• RAID10 is variable, and depends on which devices are lost.
It stripes across multiple mirror groups with raid1 layout
thus it can tolerate losing all but one device in each of
these groups without LV data loss.
If a RAID LV is missing devices, or has other device-related
problems, lvs reports this in the health_status (and attr)
fields:
lvs -o name,lv_health_status
partial
Devices are missing from the LV. This is also indicated
by the letter "p" (partial) in the 9th position of the lvs
attr field.
refresh needed
A device was temporarily missing but has returned. The LV
needs to be refreshed to use the device again (which will
usually require partial synchronization). This is also
indicated by the letter "r" (refresh needed) in the 9th
position of the lvs attr field. See Refreshing an LV
.
This could also indicate a problem with the device, in
which case it should be be replaced, see Replacing
Devices
.
mismatches exist
See Scrubbing
.
Most commands will also print a warning if a device is missing,
e.g.
WARNING: Device for PV uItL3Z-wBME-DQy0-... not found or rejected ...
This warning will go away if the device returns or is removed
from the VG (see vgreduce --removemissing
).
Activating an LV with missing devices
A RAID LV that is missing devices may be activated or not,
depending on the "activation mode" used in lvchange:
lvchange -ay --activationmode complete
|degraded
|partial
LV
complete
The LV is only activated if all devices are present.
degraded
The LV is activated with missing devices if the RAID level
can tolerate the number of missing devices without LV data
loss.
partial
The LV is always activated, even if portions of the LV
data are missing because of the missing device(s). This
should only be used to perform extreme recovery or repair
operations.
Default activation mode when not specified by the command:
lvm.conf(5) activation/activation_mode
The default value is printed by:
# lvmconfig --type default activation/activation_mode
Replacing Devices
Devices in a RAID LV can be replaced by other devices in the VG.
When replacing devices that are no longer visible on the system,
use lvconvert --repair. When replacing devices that are still
visible, use lvconvert --replace. The repair command will
attempt to restore the same number of data LVs that were
previously in the LV. The replace option can be repeated to
replace multiple PVs. Replacement devices can be optionally
listed with either option.
lvconvert --repair
LV [NewPVs]
lvconvert --replace
OldPV LV [NewPV]
lvconvert --replace
OldPV1 --replace
OldPV2 LV [NewPVs]
New devices require synchronization with existing devices.
See Synchronization
.
Refreshing an LV
Refreshing a RAID LV clears any transient device failures (device
was temporarily disconnected) and returns the LV to its fully
redundant mode. Restoring a device will usually require at least
partial synchronization (see Synchronization
). Failure to clear
a transient failure results in the RAID LV operating in degraded
mode until it is reactivated. Use the lvchange command to
refresh an LV:
lvchange --refresh
LV
# lvs -o name,vgname,segtype,attr,size vg
LV VG Type Attr LSize
lv vg raid1 Rwi-a-r-r- 100.00g
# lvchange --refresh vg/lv
# lvs -o name,vgname,segtype,attr,size vg
LV VG Type Attr LSize
lv vg raid1 Rwi-a-r--- 100.00g
Automatic repair
If a device in a RAID LV fails, device-mapper in the kernel
notifies the dmeventd(8) monitoring process (see Monitoring
).
dmeventd can be configured to automatically respond using:
lvm.conf(5) activation/raid_fault_policy
Possible settings are:
warn
A warning is added to the system log indicating that a
device has failed in the RAID LV. It is left to the user
to repair the LV, e.g. replace failed devices.
allocate
dmeventd automatically attempts to repair the LV using
spare devices in the VG. Note that even a transient
failure is treated as a permanent failure under this
setting. A new device is allocated and full
synchronization is started.
The specific command run by dmeventd(8) to warn or repair is:
lvconvert --repair --use-policies
LV
Corrupted Data
Data on a device can be corrupted due to hardware errors without
the device ever being disconnected or there being any fault in
the software. This should be rare, and can be detected (see
Scrubbing
).
Rebuild specific PVs
If specific PVs in a RAID LV are known to have corrupt data, the
data on those PVs can be reconstructed with:
lvchange --rebuild
PV LV
The rebuild option can be repeated with different PVs to replace
the data on multiple PVs.