To create a RAID LV, use lvcreate and specify an LV type. The LV
type corresponds to a RAID level. The basic RAID levels that can
be used are: raid0
, raid1
, raid4
, raid5
, raid6
, raid10
.
lvcreate --type
RaidLevel [OPTIONS] --name
Name --size
Size VG
[PVs]
To display the LV type of an existing LV, run:
lvs -o name,segtype
LV
(The LV type is also referred to as "segment type" or "segtype".)
LVs can be created with the following types:
raid0
Also called striping, raid0 spreads LV data across multiple
devices in units of stripe size. This is used to increase
performance. LV data will be lost if any of the devices fail.
lvcreate --type raid0
[--stripes
Number --stripesize
Size] VG
[PVs]
--stripes
Number
specifies the Number of devices to spread the LV across.
--stripesize
Size
specifies the Size of each stripe in kilobytes. This is
the amount of data that is written to one device before
moving to the next.
PVs specifies the devices to use. If not specified, lvm will
choose Number devices, one for each stripe based on the number of
PVs available or supplied.
raid1
Also called mirroring, raid1 uses multiple devices to duplicate
LV data. The LV data remains available if all but one of the
devices fail. The minimum number of devices (i.e. sub LV pairs)
required is 2.
lvcreate --type raid1
[--mirrors
Number] VG [PVs]
--mirrors
Number
specifies the Number of mirror images in addition to the
original LV image, e.g. --mirrors 1 means there are two
images of the data, the original and one mirror image.
PVs specifies the devices to use. If not specified, lvm will
choose Number devices, one for each image.
raid4
raid4 is a form of striping that uses an extra, first device
dedicated to storing parity blocks. The LV data remains
available if one device fails. The parity is used to recalculate
data that is lost from a single device. The minimum number of
devices required is 3.
lvcreate --type raid4
[--stripes
Number --stripesize
Size] VG
[PVs]
--stripes
Number
specifies the Number of devices to use for LV data. This
does not include the extra device lvm adds for storing
parity blocks. A raid4 LV with Number stripes requires
Number+1 devices. Number must be 2 or more.
--stripesize
Size
specifies the Size of each stripe in kilobytes. This is
the amount of data that is written to one device before
moving to the next.
PVs specifies the devices to use. If not specified, lvm will
choose Number+1 separate devices.
raid4 is called non-rotating parity because the parity blocks are
always stored on the same device.
raid5
raid5 is a form of striping that uses an extra device for storing
parity blocks. LV data and parity blocks are stored on each
device, typically in a rotating pattern for performance reasons.
The LV data remains available if one device fails. The parity is
used to recalculate data that is lost from a single device. The
minimum number of devices required is 3 (unless converting from 2
legged raid1 to reshape to more stripes; see reshaping).
lvcreate --type raid5
[--stripes
Number --stripesize
Size] VG
[PVs]
--stripes
Number
specifies the Number of devices to use for LV data. This
does not include the extra device lvm adds for storing
parity blocks. A raid5 LV with Number stripes requires
Number+1 devices. Number must be 2 or more.
--stripesize
Size
specifies the Size of each stripe in kilobytes. This is
the amount of data that is written to one device before
moving to the next.
PVs specifies the devices to use. If not specified, lvm will
choose Number+1 separate devices.
raid5 is called rotating parity because the parity blocks are
placed on different devices in a round-robin sequence. There are
variations of raid5 with different algorithms for placing the
parity blocks. The default variant is raid5_ls (raid5 left
symmetric, which is a rotating parity 0 with data restart.) See
RAID5 VARIANTS
below.
raid6
raid6 is a form of striping like raid5, but uses two extra
devices for parity blocks. LV data and parity blocks are stored
on each device, typically in a rotating pattern for performance
reasons. The LV data remains available if up to two devices
fail. The parity is used to recalculate data that is lost from
one or two devices. The minimum number of devices required is 5.
lvcreate --type raid6
[--stripes
Number --stripesize
Size] VG
[PVs]
--stripes
Number
specifies the Number of devices to use for LV data. This
does not include the extra two devices lvm adds for
storing parity blocks. A raid6 LV with Number stripes
requires Number+2 devices. Number must be 3 or more.
--stripesize
Size
specifies the Size of each stripe in kilobytes. This is
the amount of data that is written to one device before
moving to the next.
PVs specifies the devices to use. If not specified, lvm will
choose Number+2 separate devices.
Like raid5, there are variations of raid6 with different
algorithms for placing the parity blocks. The default variant is
raid6_zr (raid6 zero restart, aka left symmetric, which is a
rotating parity 0 with data restart.) See RAID6 VARIANTS
below.
raid10
raid10 is a combination of raid1 and raid0, striping data across
mirrored devices. LV data remains available if one or more
devices remains in each mirror set. The minimum number of
devices required is 4.
lvcreate --type raid10
[--mirrors
NumberMirrors]
[--stripes
NumberStripes --stripesize
Size]
VG [PVs]
--mirrors
NumberMirrors
specifies the number of mirror images within each stripe.
e.g. --mirrors 1 means there are two images of the data,
the original and one mirror image.
--stripes
NumberStripes
specifies the total number of devices to use in all raid1
images (not the number of raid1 devices to spread the LV
across, even though that is the effective result). The
number of devices in each raid1 mirror will be
NumberStripes/(NumberMirrors+1), e.g. mirrors 1 and
stripes 4 will stripe data across two raid1 mirrors, where
each mirror is devices.
--stripesize
Size
specifies the Size of each stripe in kilobytes. This is
the amount of data that is written to one device before
moving to the next.
PVs specifies the devices to use. If not specified, lvm will
choose the necessary devices. Devices are used to create mirrors
in the order listed, e.g. for mirrors 1, stripes 2, listing PV1
PV2 PV3 PV4 results in mirrors PV1/PV2 and PV3/PV4.
RAID10 is not mirroring on top of stripes, which would be RAID01,
which is less tolerant of device failures.
Configuration Options
There are a number of options in the LVM configuration file that
affect the behavior of RAID LVs. The tunable options are listed
below. A detailed description of each can be found in the LVM
configuration file itself.
mirror_segtype_default
raid10_segtype_default
raid_region_size
raid_fault_policy
activation_mode
Monitoring
When a RAID LV is activated the dmeventd(8) process is started to
monitor the health of the LV. Various events detected in the
kernel can cause a notification to be sent from device-mapper to
the monitoring process, including device failures and
synchronization completion (e.g. for initialization or
scrubbing).
The LVM configuration file contains options that affect how the
monitoring process will respond to failure events (e.g.
raid_fault_policy). It is possible to turn on and off monitoring
with lvchange, but it is not recommended to turn this off unless
you have a thorough knowledge of the consequences.
Synchronization
Synchronization is the process that makes all the devices in a
RAID LV consistent with each other.
In a RAID1 LV, all mirror images should have the same data. When
a new mirror image is added, or a mirror image is missing data,
then images need to be synchronized. Data blocks are copied from
an existing image to a new or outdated image to make them match.
In a RAID 4/5/6 LV, parity blocks and data blocks should match
based on the parity calculation. When the devices in a RAID LV
change, the data and parity blocks can become inconsistent and
need to be synchronized. Correct blocks are read, parity is
calculated, and recalculated blocks are written.
The RAID implementation keeps track of which parts of a RAID LV
are synchronized. When a RAID LV is first created and activated
the first synchronization is called initialization. A pointer
stored in the raid metadata keeps track of the initialization
process thus allowing it to be restarted after a deactivation of
the RaidLV or a crash. Any writes to the RaidLV dirties the
respective region of the write intent bitmap which allow for fast
recovery of the regions after a crash. Without this, the entire
LV would need to be synchronized every time it was activated.
Automatic synchronization happens when a RAID LV is activated,
but it is usually partial because the bitmaps reduce the areas
that are checked. A full sync becomes necessary when devices in
the RAID LV are replaced.
The synchronization status of a RAID LV is reported by the
following command, where "Cpy%Sync" = "100%" means sync is
complete:
lvs -a -o name,sync_percent
Scrubbing
Scrubbing is a full scan of the RAID LV requested by a user.
Scrubbing can find problems that are missed by partial
synchronization.
Scrubbing assumes that RAID metadata and bitmaps may be
inaccurate, so it verifies all RAID metadata, LV data, and parity
blocks. Scrubbing can find inconsistencies caused by hardware
errors or degradation. These kinds of problems may be undetected
by automatic synchronization which excludes areas outside of the
RAID write-intent bitmap.
The command to scrub a RAID LV can operate in two different
modes:
lvchange --syncaction check
|repair
LV
check
Check mode is read-only and only detects inconsistent
areas in the RAID LV, it does not correct them.
repair
Repair mode checks and writes corrected blocks to
synchronize any inconsistent areas.
Scrubbing can consume a lot of bandwidth and slow down
application I/O on the RAID LV. To control the I/O rate used for
scrubbing, use:
--maxrecoveryrate
Size[k|UNIT]
Sets the maximum recovery rate for a RAID LV. Size is
specified as an amount per second for each device in the
array. If no suffix is given, then KiB/sec/device is
used. Setting the recovery rate to 0
means it will be
unbounded.
--minrecoveryrate
Size[k|UNIT]
Sets the minimum recovery rate for a RAID LV. Size is
specified as an amount per second for each device in the
array. If no suffix is given, then KiB/sec/device is
used. Setting the recovery rate to 0
means it will be
unbounded.
To display the current scrubbing in progress on an LV, including
the syncaction mode and percent complete, run:
lvs -a -o name,raid_sync_action,sync_percent
After scrubbing is complete, to display the number of
inconsistent blocks found, run:
lvs -o name,raid_mismatch_count
Also, if mismatches were found, the lvs attr field will display
the letter "m" (mismatch) in the 9th position, e.g.
# lvs -o name,vgname,segtype,attr vg/lv
LV VG Type Attr
lv vg raid1 Rwi-a-r-m-
Scrubbing Limitations
The check
mode can only report the number of inconsistent blocks,
it cannot report which blocks are inconsistent. This makes it
impossible to know which device has errors, or if the errors
affect file system data, metadata or nothing at all.
The repair
mode can make the RAID LV data consistent, but it does
not know which data is correct. The result may be consistent but
incorrect data. When two different blocks of data must be made
consistent, it chooses the block from the device that would be
used during RAID initialization. However, if the PV holding
corrupt data is known, lvchange --rebuild can be used in place of
scrubbing to reconstruct the data on the bad device.
Future developments might include:
Allowing a user to choose the correct version of data during
repair.
Using a majority of devices to determine the correct version of
data to use in a 3-way RAID1 or RAID6 LV.
Using a checksumming device to pin-point when and where an error
occurs, allowing it to be rewritten.
SubLVs
An LV is often a combination of other hidden LVs called SubLVs.
The SubLVs either use physical devices, or are built from other
SubLVs themselves. SubLVs hold LV data blocks, RAID parity
blocks, and RAID metadata. SubLVs are generally hidden, so the
lvs -a option is required to display them:
lvs -a -o name,segtype,devices
SubLV names begin with the visible LV name, and have an automatic
suffix indicating its role:
• SubLVs holding LV data or parity blocks have the suffix
_rimage_#.
These SubLVs are sometimes referred to as DataLVs.
• SubLVs holding RAID metadata have the suffix _rmeta_#.
RAID metadata includes superblock information, RAID type,
bitmap, and device health information.
These SubLVs are sometimes referred to as MetaLVs.
SubLVs are an internal implementation detail of LVM. The way
they are used, constructed and named may change.
The following examples show the SubLV arrangement for each of the
basic RAID LV types, using the fewest number of devices allowed
for each.
Examples
raid0
Each rimage SubLV holds a portion of LV data. No parity is used.
No RAID metadata is used.
# lvcreate --type raid0 --stripes 2 --name lvr0 ...
# lvs -a -o name,segtype,devices
lvr0 raid0 lvr0_rimage_0(0),lvr0_rimage_1(0)
[lvr0_rimage_0] linear /dev/sda(...)
[lvr0_rimage_1] linear /dev/sdb(...)
raid1
Each rimage SubLV holds a complete copy of LV data. No parity is
used. Each rmeta SubLV holds RAID metadata.
# lvcreate --type raid1 --mirrors 1 --name lvr1 ...
# lvs -a -o name,segtype,devices
lvr1 raid1 lvr1_rimage_0(0),lvr1_rimage_1(0)
[lvr1_rimage_0] linear /dev/sda(...)
[lvr1_rimage_1] linear /dev/sdb(...)
[lvr1_rmeta_0] linear /dev/sda(...)
[lvr1_rmeta_1] linear /dev/sdb(...)
raid4
At least three rimage SubLVs each hold a portion of LV data and
one rimage SubLV holds parity. Each rmeta SubLV holds RAID
metadata.
# lvcreate --type raid4 --stripes 2 --name lvr4 ...
# lvs -a -o name,segtype,devices
lvr4 raid4 lvr4_rimage_0(0),\
lvr4_rimage_1(0),\
lvr4_rimage_2(0)
[lvr4_rimage_0] linear /dev/sda(...)
[lvr4_rimage_1] linear /dev/sdb(...)
[lvr4_rimage_2] linear /dev/sdc(...)
[lvr4_rmeta_0] linear /dev/sda(...)
[lvr4_rmeta_1] linear /dev/sdb(...)
[lvr4_rmeta_2] linear /dev/sdc(...)
raid5
At least three rimage SubLVs each typically hold a portion of LV
data and parity (see section on raid5) Each rmeta SubLV holds
RAID metadata.
# lvcreate --type raid5 --stripes 2 --name lvr5 ...
# lvs -a -o name,segtype,devices
lvr5 raid5 lvr5_rimage_0(0),\
lvr5_rimage_1(0),\
lvr5_rimage_2(0)
[lvr5_rimage_0] linear /dev/sda(...)
[lvr5_rimage_1] linear /dev/sdb(...)
[lvr5_rimage_2] linear /dev/sdc(...)
[lvr5_rmeta_0] linear /dev/sda(...)
[lvr5_rmeta_1] linear /dev/sdb(...)
[lvr5_rmeta_2] linear /dev/sdc(...)
raid6
At least five rimage SubLVs each typically hold a portion of LV
data and parity. (see section on raid6) Each rmeta SubLV holds
RAID metadata.
# lvcreate --type raid6 --stripes 3 --name lvr6
# lvs -a -o name,segtype,devices
lvr6 raid6 lvr6_rimage_0(0),\
lvr6_rimage_1(0),\
lvr6_rimage_2(0),\
lvr6_rimage_3(0),\
lvr6_rimage_4(0),\
lvr6_rimage_5(0)
[lvr6_rimage_0] linear /dev/sda(...)
[lvr6_rimage_1] linear /dev/sdb(...)
[lvr6_rimage_2] linear /dev/sdc(...)
[lvr6_rimage_3] linear /dev/sdd(...)
[lvr6_rimage_4] linear /dev/sde(...)
[lvr6_rimage_5] linear /dev/sdf(...)
[lvr6_rmeta_0] linear /dev/sda(...)
[lvr6_rmeta_1] linear /dev/sdb(...)
[lvr6_rmeta_2] linear /dev/sdc(...)
[lvr6_rmeta_3] linear /dev/sdd(...)
[lvr6_rmeta_4] linear /dev/sde(...)
[lvr6_rmeta_5] linear /dev/sdf(...)
raid10
At least four rimage SubLVs each hold a portion of LV data. No
parity is used. Each rmeta SubLV holds RAID metadata.
# lvcreate --type raid10 --stripes 2 --mirrors 1 --name lvr10
# lvs -a -o name,segtype,devices
lvr10 raid10 lvr10_rimage_0(0),\
lvr10_rimage_1(0),\
lvr10_rimage_2(0),\
lvr10_rimage_3(0)
[lvr10_rimage_0] linear /dev/sda(...)
[lvr10_rimage_1] linear /dev/sdb(...)
[lvr10_rimage_2] linear /dev/sdc(...)
[lvr10_rimage_3] linear /dev/sdd(...)
[lvr10_rmeta_0] linear /dev/sda(...)
[lvr10_rmeta_1] linear /dev/sdb(...)
[lvr10_rmeta_2] linear /dev/sdc(...)
[lvr10_rmeta_3] linear /dev/sdd(...)