демон блокировки LVM (LVM locking daemon)
TOPICS
Protecting VGs on shared devices
The following terms are used to describe the different ways of
accessing VGs on shared devices.
shared VG
A shared VG exists on shared storage that is visible to multiple
hosts. LVM acquires locks through lvmlockd to coordinate access
to shared VGs. A shared VG has lock_type "dlm" or "sanlock",
which specifies the lock manager lvmlockd will use.
When the lock manager for the lock type is not available (e.g.
not started or failed), lvmlockd is unable to acquire locks for
LVM commands. In this situation, LVM commands are only allowed
to read and display the VG; changes and activation will fail.
local VG
A local VG is meant to be used by a single host. It has no lock
type or lock type "none". A local VG typically exists on local
(non-shared) devices and cannot be used concurrently from
different hosts.
If a local VG does exist on shared devices, it should be owned by
a single host by having the system ID set, see lvmsystemid(7).
The host with a matching system ID can use the local VG and other
hosts will ignore it. A VG with no lock type and no system ID
should be excluded from all but one host using lvm.conf filters.
Without any of these protections, a local VG on shared devices
can be easily damaged or destroyed.
clvm VG
A clvm VG (or clustered VG) is a VG on shared storage (like a
shared VG) that requires clvmd for clustering and locking. See
below for converting a clvm/clustered VG to a shared VG.
Shared VGs from hosts not using lvmlockd
Hosts that do not use shared VGs will not be running lvmlockd.
In this case, shared VGs that are still visible to the host will
be ignored (like foreign VGs, see lvmsystemid(7)).
The --shared option for reporting and display commands causes
shared VGs to be displayed on a host not using lvmlockd, like the
--foreign option does for foreign VGs.
Creating the first sanlock VG
When use_lvmlockd is first enabled in lvm.conf, and before the
first sanlock VG is created, no global lock will exist. In this
initial state, LVM commands try and fail to acquire the global
lock, producing a warning, and some commands are disallowed.
Once the first sanlock VG is created, the global lock will be
available, and LVM will be fully operational.
When a new sanlock VG is created, its lockspace is automatically
started on the host that creates it. Other hosts need to run
'vgchange --lock-start' to start the new VG before they can use
it.
Creating the first sanlock VG is not protected by locking, so it
requires special attention. This is because sanlock locks exist
on storage within the VG, so they are not available until after
the VG is created. The first sanlock VG that is created will
automatically contain the "global lock". Be aware of the
following special considerations:
• The first vgcreate command needs to be given the path to a
device that has not yet been initialized with pvcreate. The
pvcreate initialization will be done by vgcreate. This is
because the pvcreate command requires the global lock, which
will not be available until after the first sanlock VG is
created.
• Because the first sanlock VG will contain the global lock, this
VG needs to be accessible to all hosts that will use sanlock
shared VGs. All hosts will need to use the global lock from
the first sanlock VG.
• The device and VG name used by the initial vgcreate will not be
protected from concurrent use by another vgcreate on another
host.
See below for more information about managing the sanlock global
lock.
Using shared VGs
In the 'vgs' command, shared VGs are indicated by "s" (for
shared) in the sixth attr field, and by "shared" in the
"--options shared" report field. The specific lock type and lock
args for a shared VG can be displayed with 'vgs
-o+locktype,lockargs'.
Shared VGs need to be "started" and "stopped", unlike other types
of VGs. See the following section for a full description of
starting and stopping.
Removing a shared VG will fail if other hosts have the VG
started. Run vgchange --lock-stop <vgname> on all other hosts
before vgremove. (It may take several seconds before vgremove
recognizes that all hosts have stopped a sanlock VG.)
Starting and stopping VGs
Starting a shared VG (vgchange --lock-start) causes the lock
manager to start (join) the lockspace for the VG on the host
where it is run. This makes locks for the VG available to LVM
commands on the host. Before a VG is started, only LVM commands
that read/display the VG are allowed to continue without locks
(and with a warning).
Stopping a shared VG (vgchange --lock-stop) causes the lock
manager to stop (leave) the lockspace for the VG on the host
where it is run. This makes locks for the VG inaccessible to the
host. A VG cannot be stopped while it has active LVs.
When using the lock type sanlock, starting a VG can take a long
time (potentially minutes if the host was previously shut down
without cleanly stopping the VG.)
A shared VG can be started after all the following are true:
• lvmlockd is running
• the lock manager is running
• the VG's devices are visible on the system
A shared VG can be stopped if all LVs are deactivated.
All shared VGs can be started/stopped using:
vgchange --lock-start
vgchange --lock-stop
Individual VGs can be started/stopped using:
vgchange --lock-start <vgname> ...
vgchange --lock-stop <vgname> ...
To make vgchange not wait for start to complete:
vgchange --lock-start --lock-opt nowait ...
lvmlockd can be asked directly to stop all lockspaces:
lvmlockctl -S|--stop-lockspaces
To start only selected shared VGs, use the lvm.conf
activation/lock_start_list. When defined, only VG names in this
list are started by vgchange. If the list is not defined (the
default), all visible shared VGs are started. To start only
"vg1", use the following lvm.conf configuration:
activation {
lock_start_list = [ "vg1" ]
...
}
Internal command locking
To optimize the use of LVM with lvmlockd, be aware of the three
kinds of locks and when they are used:
Global lock
The global lock is associated with global information, which is
information not isolated to a single VG. This includes:
• The global VG namespace.
• The set of orphan PVs and unused devices.
• The properties of orphan PVs, e.g. PV size.
The global lock is acquired in shared mode by commands that read
this information, or in exclusive mode by commands that change
it. For example, the command 'vgs' acquires the global lock in
shared mode because it reports the list of all VG names, and the
vgcreate command acquires the global lock in exclusive mode
because it creates a new VG name, and it takes a PV from the list
of unused PVs.
When an LVM command is given a tag argument, or uses select, it
must read all VGs to match the tag or selection, which causes the
global lock to be acquired.
VG lock
A VG lock is associated with each shared VG. The VG lock is
acquired in shared mode to read the VG and in exclusive mode to
change the VG or activate LVs. This lock serializes access to a
VG with all other LVM commands accessing the VG from all hosts.
The command 'vgs <vgname>' does not acquire the global lock (it
does not need the list of all VG names), but will acquire the VG
lock on each VG name argument.
LV lock
An LV lock is acquired before the LV is activated, and is
released after the LV is deactivated. If the LV lock cannot be
acquired, the LV is not activated. (LV locks are persistent and
remain in place when the activation command is done. Global and
VG locks are transient, and are held only while an LVM command is
running.)
lock retries
If a request for a global or VG lock fails due to a lock conflict
with another host, lvmlockd automatically retries for a short
time before returning a failure to the LVM command. If those
retries are insufficient, the LVM command will retry the entire
lock request a number of times specified by
global/lvmlockd_lock_retries before failing. If a request for an
LV lock fails due to a lock conflict, the command fails
immediately.
Managing the global lock in sanlock VGs
The global lock exists in one of the sanlock VGs. The first
sanlock VG created will contain the global lock. Subsequent
sanlock VGs will each contain a disabled global lock that can be
enabled later if necessary.
The VG containing the global lock must be visible to all hosts
using sanlock VGs. For this reason, it can be useful to create a
small sanlock VG, visible to all hosts, and dedicated to just
holding the global lock. While not required, this strategy can
help to avoid difficulty in the future if VGs are moved or
removed.
The vgcreate command typically acquires the global lock, but in
the case of the first sanlock VG, there will be no global lock to
acquire until the first vgcreate is complete. So, creating the
first sanlock VG is a special case that skips the global lock.
vgcreate determines that it's creating the first sanlock VG when
no other sanlock VGs are visible on the system. It is possible
that other sanlock VGs do exist, but are not visible when
vgcreate checks for them. In this case, vgcreate will create a
new sanlock VG with the global lock enabled. When the another VG
containing a global lock appears, lvmlockd will then see more
than one VG with a global lock enabled. LVM commands will report
that there are duplicate global locks.
If the situation arises where more than one sanlock VG contains a
global lock, the global lock should be manually disabled in all
but one of them with the command:
lvmlockctl --gl-disable <vgname>
(The one VG with the global lock enabled must be visible to all
hosts.)
An opposite problem can occur if the VG holding the global lock
is removed. In this case, no global lock will exist following
the vgremove, and subsequent LVM commands will fail to acquire
it. In this case, the global lock needs to be manually enabled
in one of the remaining sanlock VGs with the command:
lvmlockctl --gl-enable <vgname>
(Using a small sanlock VG dedicated to holding the global lock
can avoid the case where the global lock must be manually enabled
after a vgremove.)
Internal lvmlock LV
A sanlock VG contains a hidden LV called "lvmlock" that holds the
sanlock locks. vgreduce cannot yet remove the PV holding the
lvmlock LV. To remove this PV, change the VG lock type to
"none", run vgreduce, then change the VG lock type back to
"sanlock". Similarly, pvmove cannot be used on a PV used by the
lvmlock LV.
To place the lvmlock LV on a specific device, create the VG with
only that device, then use vgextend to add other devices.
LV activation
In a shared VG, LV activation involves locking through lvmlockd,
and the following values are possible with lvchange/vgchange -a:
y
|ey
The command activates the LV in exclusive mode, allowing a
single host to activate the LV. Before activating the LV,
the command uses lvmlockd to acquire an exclusive lock on
the LV. If the lock cannot be acquired, the LV is not
activated and an error is reported. This would happen if
the LV is active on another host.
sy
The command activates the LV in shared mode, allowing
multiple hosts to activate the LV concurrently. Before
activating the LV, the command uses lvmlockd to acquire a
shared lock on the LV. If the lock cannot be acquired,
the LV is not activated and an error is reported. This
would happen if the LV is active exclusively on another
host. If the LV type prohibits shared access, such as a
snapshot, the command will report an error and fail. The
shared mode is intended for a multi-host/cluster
application or file system. LV types that cannot be used
concurrently from multiple hosts include thin, cache,
raid, mirror, and snapshot.
n
The command deactivates the LV. After deactivating the
LV, the command uses lvmlockd to release the current lock
on the LV.
Manually repairing a shared VG
Some failure conditions may not be repairable while the VG has a
shared lock type. In these cases, it may be possible to repair
the VG by forcibly changing the lock type to "none". This is
done by adding "--lock-opt force" to the normal command for
changing the lock type: vgchange --lock-type none VG. The VG
lockspace should first be stopped on all hosts, and be certain
that no hosts are using the VG before this is done.
Recover from lost PV holding sanlock locks
In a sanlock VG, the sanlock locks are held on the hidden
"lvmlock" LV. If the PV holding this LV is lost, a new lvmlock
LV needs to be created. To do this, ensure no hosts are using
the VG, then forcibly change the lock type to "none" (see above).
Then change the lock type back to "sanlock" with the normal
command for changing the lock type: vgchange --lock-type sanlock
VG. This recreates the internal lvmlock LV with the necessary
locks.
Locking system failures
lvmlockd failure
If lvmlockd fails or is killed while holding locks, the locks are
orphaned in the lock manager. Orphaned locks must be cleared or
adopted before the associated resources can be accessed normally.
If lock adoption is enabled, lvmlockd keeps a record of locks in
the adopt-file. A subsequent instance of lvmlockd will then
adopt locks orphaned by the previous instance. Adoption must be
enabled in both instances (--adopt|-A 1). Without adoption, the
lock manager or host would require a reset to clear orphaned lock
state.
dlm/corosync failure
If dlm or corosync fail, the clustering system will fence the
host using a method configured within the dlm/corosync clustering
environment.
LVM commands on other hosts will be blocked from acquiring any
locks until the dlm/corosync recovery process is complete.
sanlock lease storage failure
If the PV under a sanlock VG's lvmlock LV is disconnected,
unresponsive or too slow, sanlock cannot renew the lease for the
VG's locks. After some time, the lease will expire, and locks
that the host owns in the VG can be acquired by other hosts. The
VG must be forcibly deactivated on the host with the expiring
lease before other hosts can acquire its locks. This is
necessary for data protection.
When the sanlock daemon detects that VG storage is lost and the
VG lease is expiring, it runs the command lvmlockctl --kill
<vgname>. This command emits a syslog message stating that
storage is lost for the VG, and that LVs in the VG must be
immediately deactivated.
If no LVs are active in the VG, then the VG lockspace will be
removed, and errors will be reported when trying to use the VG.
Use the lvmlockctl --drop command to clear the stale lockspace
from lvmlockd.
If the VG has active LVs, they must be quickly deactivated before
the locks expire. After all LVs are deactivated, run lvmlockctl
--drop <vgname> to clear the expiring lockspace from lvmlockd.
If all LVs in the VG are not deactivated within about 40 seconds,
sanlock uses wdmd and the local watchdog to reset the host. The
machine reset is effectively a severe form of "deactivating" LVs
before they can be activated on other hosts. The reset is
considered a better alternative than having LVs used by multiple
hosts at once, which could easily damage or destroy their
content.
sanlock lease storage failure automation
When the sanlock daemon detects that the lease storage is lost,
it runs the command lvmlockctl --kill <vgname>. This lvmlockctl
command can be configured to run another command to forcibly
deactivate LVs, taking the place of the manual process described
above. The other command is configured in the lvm.conf
lvmlockctl_kill_command setting. The VG name is appended to the
end of the command specified.
The lvmlockctl_kill_command should forcibly deactivate LVs in the
VG, ensuring that existing writes to LVs in the VG are complete
and that further writes to the LVs in the VG will be rejected.
If it is able to do this successfully, it should exit with
success, otherwise it should exit with an error. If lvmlockctl
--kill gets a successful result from lvmlockctl_kill_command, it
tells lvmlockd to drop locks for the VG (the equivalent of
running lvmlockctl --drop). If this completes in time, a machine
reset can be avoided.
One possible option is to create a script my_vg_kill_script.sh:
#!/bin/bash
VG=$1
# replace dm table with the error target for top level LVs
dmsetup wipe_table -S "uuid=~LVM && vgname=$VG && lv_layer=\"\""
# check that the error target is in place
dmsetup table -c -S "uuid=~LVM && vgname=$VG && lv_layer=\"\"" |grep -vw error
if [[ $? -ne 0 ]] ; then
exit 0
fi
exit 1
Set in lvm.conf:
lvmlockctl_kill_command="/usr/sbin/my_vg_kill_script.sh"
(The script and dmsetup commands should be tested with the actual
VG to ensure that all top level LVs are properly disabled.)
If the lvmlockctl_kill_command is not configured, or fails,
lvmlockctl --kill will emit syslog messages as described in the
previous section, notifying the user to manually deactivate the
VG before sanlock resets the machine.
sanlock daemon failure
If the sanlock daemon fails or exits while a lockspace is
started, the local watchdog will reset the host. This is
necessary to protect any application resources that depend on
sanlock leases.
Changing dlm cluster name
When a dlm VG is created, the cluster name is saved in the VG
metadata. To use the VG, a host must be in the named dlm
cluster. If the dlm cluster name changes, or the VG is moved to
a new cluster, the dlm cluster name saved in the VG must also be
changed.
To see the dlm cluster name saved in the VG, use the command:
vgs -o+locktype,lockargs <vgname>
To change the dlm cluster name in the VG when the VG is still
used by the original cluster:
• Start the VG on the host changing the lock type
vgchange --lock-start <vgname>
• Stop the VG on all other hosts:
vgchange --lock-stop <vgname>
• Change the VG lock type to none on the host where the VG is
started:
vgchange --lock-type none <vgname>
• Change the dlm cluster name on the hosts or move the VG to the
new cluster. The new dlm cluster must now be running on the
host. Verify the new name by:
cat /sys/kernel/config/dlm/cluster/cluster_name
• Change the VG lock type back to dlm which sets the new cluster
name:
vgchange --lock-type dlm <vgname>
• Start the VG on hosts to use it:
vgchange --lock-start <vgname>
To change the dlm cluster name in the VG when the dlm cluster
name has already been changed on the hosts, or the VG has already
moved to a different cluster:
• Ensure the VG is not being used by any hosts.
• The new dlm cluster must be running on the host making the
change. The current dlm cluster name can be seen by:
cat /sys/kernel/config/dlm/cluster/cluster_name
• Change the VG lock type to none:
vgchange --lock-type none --lock-opt force <vgname>
• Change the VG lock type back to dlm which sets the new cluster
name:
vgchange --lock-type dlm <vgname>
• Start the VG on hosts to use it:
vgchange --lock-start <vgname>
Changing a local VG to a shared VG
All LVs must be inactive to change the lock type.
lvmlockd must be configured and running as described in USAGE.
• Change a local VG to a shared VG with the command:
vgchange --lock-type sanlock|dlm <vgname>
• Start the VG on hosts to use it:
vgchange --lock-start <vgname>
Changing a shared VG to a local VG
All LVs must be inactive to change the lock type.
• Start the VG on the host making the change:
vgchange --lock-start <vgname>
• Stop the VG on all other hosts:
vgchange --lock-stop <vgname>
• Change the VG lock type to none on the host where the VG is
started:
vgchange --lock-type none <vgname>
If the VG cannot be started with the previous lock type, then the
lock type can be forcibly changed to none with:
vgchange --lock-type none --lock-opt force <vgname>
To change a VG from one lock type to another (i.e. between
sanlock and dlm), first change it to a local VG, then to the new
type.
Changing a clvm/clustered VG to a shared VG
All LVs must be inactive to change the lock type.
First change the clvm/clustered VG to a local VG. Within a
running clvm cluster, change a clustered VG to a local VG with
the command:
vgchange -cn <vgname>
If the clvm cluster is no longer running on any nodes, then extra
options can be used to forcibly make the VG local. Caution: this
is only safe if all nodes have stopped using the VG:
vgchange --lock-type none --lock-opt force <vgname>
After the VG is local, follow the steps described in "changing a
local VG to a shared VG".
Extending an LV active on multiple hosts
With lvmlockd and dlm, a special clustering procedure is used to
refresh a shared LV on remote cluster nodes after it has been
extended on one node.
When an LV holding gfs2 or ocfs2 is active on multiple hosts with
a shared lock, lvextend is permitted to run with an existing
shared LV lock in place of the normal exclusive LV lock.
After lvextend has finished extending the LV, it sends a remote
request to other nodes running the dlm to run 'lvchange
--refresh' on the LV. This uses dlm_controld and corosync
features.
Some special --lockopt values can be used to modify this process.
"shupdate" permits the lvextend update with an existing shared
lock if it isn't otherwise permitted. "norefresh" prevents the
remote refresh operation.
Limitations of shared VGs
Things that do not yet work in shared VGs:
• using external origins for thin LVs
• splitting snapshots from LVs
• splitting mirrors in sanlock VGs
• pvmove of entire PVs, or under LVs activated with shared locks
• vgsplit and vgmerge (convert to a local VG to do this)
lvmlockd changes from clvmd
(See above for converting an existing clvm VG to a shared VG.)
While lvmlockd and clvmd are entirely different systems, LVM
command usage remains similar. Differences are more notable when
using lvmlockd's sanlock option.
Visible usage differences between shared VGs (using lvmlockd) and
clvm/clustered VGs (using clvmd):
• lvm.conf is configured to use lvmlockd by setting
use_lvmlockd=1. clvmd used locking_type=3.
• vgcreate --shared creates a shared VG. vgcreate --clustered y
created a clvm/clustered VG.
• lvmlockd adds the option of using sanlock for locking, avoiding
the need for network clustering.
• lvmlockd defaults to the exclusive activation mode whenever the
activation mode is unspecified, i.e. -ay means -aey, not -asy.
• lvmlockd commands always apply to the local host, and never
have an effect on a remote host. (The activation option 'l' is
not used.)
• lvmlockd saves the cluster name for a shared VG using dlm.
Only hosts in the matching cluster can use the VG.
• lvmlockd requires starting/stopping shared VGs with vgchange
--lock-start and --lock-stop.
• vgremove of a sanlock VG may fail indicating that all hosts
have not stopped the VG lockspace. Stop the VG on all hosts
using vgchange --lock-stop.
• vgreduce or pvmove of a PV in a sanlock VG will fail if it
holds the internal "lvmlock" LV that holds the sanlock locks.
• lvmlockd uses lock retries instead of lock queueing, so high
lock contention may require increasing
global/lvmlockd_lock_retries to avoid transient lock failures.
• lvmlockd includes VG reporting options lock_type and lock_args,
and LV reporting option lock_args to view the corresponding
metadata fields.
• In the 'vgs' command's sixth VG attr field, "s" for "shared" is
displayed for shared VGs.
• If lvmlockd fails or is killed while in use, locks it held
remain but are orphaned in the lock manager. lvmlockd can be
restarted with an option to adopt the orphan locks from the
previous instance of lvmlockd.