Main Cluster components¶
- cluster-glue
- resource-agents
- libqb
- corosync
- pacemaker
- dlm
The main objective is to build a corosync cluster based on corosync 2.0, or what Andrew Beekhof calls option 3: Everyone talks to corosync 2.0
In order to clarify current possible corosync architectures, you may want to read this post from Andrew:
http://theclusterguy.clusterlabs.org/post/34604901720/pacemaker-and-cluster-filesystems
We want to do it on top of ubuntu (Current LTS is 12.04, precise). Current ubuntu cluster stack is based on option 2: Everyone talks to CMAN. This is the main source for Ubuntu cluster status: https://wiki.ubuntu.com/ClusterStack/Precise
While option 2 is the safest approach for a couple of years, we whant to build on option 3, as this is the future option, architecturaly superior to option 1 and option 2.
So this document is about building a option 2 cluster on top of ubuntu precise.
TODO: link main refefences of doc: http://clusterlabs.org/doc/
TODO: explain current problem with OCFS2
TODO: explain GFS2 cluster filesystem needs and issues
TODO: intro an explain base components and extra components for gfs2
Important note: In order to prevent compile errors, you must start in a clean environment, with no previous versions of the components that you are building installed on your system. So be sure of making these checks before continuing:
Create a script to set your environ variables: $HOME/exports.sh
1 2 3 4 5 6 7 8 9 | #!/bin/sh
export PREFIX=/opt/ha
export PKG_CONFIG_PATH=$PREFIX/lib/pkgconfig
export LCRSODIR=$PREFIX/libexec/lcrso
export LDFLAGS=-L$PREFIX/lib
export CPPFLAGS=-I$PREFIX/include
export CFLAGS=-I$PREFIX/include
export CLUSTER_USER=hacluster
export CLUSTER_GROUP=haclient
|
Before build components, do always: source $HOME/exports.sh
Create hacluster user and haclient group. This example uses uid and gid 120, but you can choose any free one:
1 2 | sudo groupadd -g 120 haclient
sudo useradd -u 120 -g 120 -s /bin/false -d /usr/lib/heartbeat hacluster
|
Install needed development packages:
1 2 3 4 5 6 7 8 9 | sudo aptitude install build-essential git mercurial
sudo aptitude install autogen automake libtool pkg-config groff autopoint bison bison-dev
sudo aptitude install libncurses-dev libreadline-dev
sudo aptitude install libaio-dev libglib2.0-dev libxml2-dev libbz2-dev uuid-dev
sudo aptitude install libnss3-dev libxslt-dev
# for gfs2-utils:
sudo aptitude install libblkid-dev check
# crmsh
sudo aptitude install python-lxml
|
cluster-glue is the base glue for corosync and pacemaker. It sets development headers with variables about cluster environment, like cluster common paths.
As an example, creates the include file include/heartbeat/glue-config.h, with common definitions for components like corosync and pacemaker.
Build and install procedure:
1 2 3 4 5 6 7 8 9 10 | hg clone http://hg.linux-ha.org/glue
./autogen.sh
./configure --prefix=$PREFIX --with-daemon-user=${CLUSTER_USER} \
--with-daemon-group=${CLUSTER_GROUP} --enable-fatal-warnings=no \
--with-ocf-root=$PREFIX/usr/lib/ocf
make
sudo make install
|
clone and build:
1 2 3 4 5 6 7 8 | git clone https://github.com/asalkeld/libqb.git
cd libqb
./autogen.sh
./configure --prefix=$PREFIX
make
sudo make install
|
clone and build:
1 2 3 4 5 | git clone git://github.com/ClusterLabs/resource-agents
cd resource-agents
./autogen.sh && ./configure --prefix=$PREFIX
make
sudo make install
|
If we install future resource agents in system standar dir, we want it to be written in our OCF dir:
1 | sudo ln -s /opt/ha/usr/lib/ocf /usr/lib/ocf
|
Clone, build and install:
1 2 3 4 | git clone git://github.com/corosync/corosync.git
./autogen.sh
./configure --prefix=$PREFIX
make
|
Clone, build and install:
1 2 3 4 5 6 7 8 | git clone git://github.com/ClusterLabs/pacemaker.git
cd pacemaker
./autogen.sh
./configure --prefix=$PREFIX --without-cman \
--without-heartbeat --with-corosync \
--enable-fatal-warnings=no --with-lcrso-dir=$LCRSODIR
make
sudo make install
|
Configure script should give a result like this:
Version = 1.1.8 (Build: f94e1e4)
Features = libqb-logging libqb-ipc lha-fencing upstart systemd nagios corosync-native
Prefix = /opt/ha
Executables = /opt/ha/sbin
Man pages = /opt/ha/share/man
Libraries = /opt/ha/lib
Header files = /opt/ha/include
Arch-independent files = /opt/ha/share
State information = /opt/ha/var
System configuration = /opt/ha/etc
Corosync Plugins = /opt/ha/lib
Use system LTDL = yes
HA group name = haclient
HA user name = hacluster
CFLAGS = -I/opt/ha/include -I/opt/ha/include -I/opt/ha/include/heartbeat -I/opt/ha/include -I/opt/ha/include -ggdb -fgnu89-inline -fstack-protector-all -Wall -Waggregate-return -Wbad-function-cast -Wcast-align -Wdeclaration-after-statement -Wendif-labels -Wfloat-equal -Wformat=2 -Wformat-security -Wformat-nonliteral -Wmissing-prototypes -Wmissing-declarations -Wnested-externs -Wno-long-long -Wno-strict-aliasing -Wunused-but-set-variable -Wpointer-arith -Wstrict-prototypes -Wwrite-strings
Libraries = -lcorosync_common -lplumb -lpils -lqb -lbz2 -lxslt -lxml2 -lc -lglib-2.0 -lglib-2.0 -luuid -lrt -ldl -lglib-2.0 -lltdl -L/opt/ha/lib -lqb -ldl -lrt -lpthread
Stack Libraries = -L/opt/ha/lib -lqb -ldl -lrt -lpthread -L/opt/ha/lib -lcpg -L/opt/ha/lib -lcfg -L/opt/ha/lib -lcmap -L/opt/ha/lib -lquorum
1 2 3 4 5 6 | hg clone http://hg.savannah.nongnu.org/hgweb/crmsh/
cd crmsh
./autogen
./configure --prefix=$PREFIX
make
sudo make install
|
This is a fast and basic cluster configuration.
Put these environ variables at the end of your .bashrc (both your user and root):
1 2 3 | export PATH=/opt/ha/sbin:$PATH
export MANPATH=/op/ha/share/man:$MANPATH
export PYTHONPATH=/opt/ha/lib/python2.7/site-packages
|
Copy corosync init script to init.d:
1 | sudo cp /opt/ha/etc/init.d/corosync /etc/init.d/corosync
|
Create config file:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | totem {
version: 2
crypto_cipher: none
crypto_hash: none
cluster_name: fiestaha
interface {
# Rings must be consecutively numbered, starting at 0.
ringnumber: 0
ttl: 1
bindnetaddr: 192.168.132.11
mcastaddr: 226.94.1.1
mcastport: 5405
}
}
logging {
fileline: off
to_stderr: yes
to_logfile: no
to_syslog: yes
syslog_facility: local7
# Log debug messages (very verbose). When in doubt, leave off.
debug: off
timestamp: on
logger_subsys {
subsys: QUORUM
debug: off
}
}
quorum {
provider: corosync_votequorum
expected_votes: 2
two_node: 1
wait_for_all: 0
}
|
TODO: good rsyslog config
On /etc/rsyslog.d/50-debian-default:
1 2 | local7.* /var/log/corosync.log
*.*;auth,authpriv.none;local7.none -/var/log/syslog
|
Start cluster services:
1 2 | sudo service corosync start
sudo service pacemaker start
|
Logs should be on /var/log/corosync.log, using rsyslog daemon.
We can see cluster config wiht crm: sudo -i crm configure show:
1 2 3 4 5 6 | bercab@fiesta-ha1:/usr/src$ sudo -i crm configure show
node $id="193243328" fiesta-ha1
property $id="cib-bootstrap-options" \
dc-version="1.1.8-f94e1e4" \
cluster-infrastructure="corosync" \
stonith-enabled="false"
|
TODO: show crm_mon
Corosync output: corosync -f:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | Feb 14 09:08:23 notice [MAIN ] Corosync Cluster Engine ('2.3.0.3-6617'): started and ready to provide service.
Feb 14 09:08:23 info [MAIN ] Corosync built-in features: pie relro bindnow
Feb 14 09:08:23 notice [TOTEM ] Initializing transport (UDP/IP Multicast).
Feb 14 09:08:23 notice [TOTEM ] Initializing transmit/receive security (NSS) crypto: none hash: none
Feb 14 09:08:23 notice [TOTEM ] The network interface [192.168.132.11] is now up.
Feb 14 09:08:23 notice [SERV ] Service engine loaded: corosync configuration map access [0]
Feb 14 09:08:23 notice [SERV ] Service engine loaded: corosync configuration service [1]
Feb 14 09:08:23 notice [SERV ] Service engine loaded: corosync cluster closed process group service v1.01 [2]
Feb 14 09:08:23 notice [SERV ] Service engine loaded: corosync profile loading service [4]
Feb 14 09:08:23 notice [QUORUM] Using quorum provider corosync_votequorum
Feb 14 09:08:23 notice [QUORUM] This node is within the primary component and will provide service.
Feb 14 09:08:23 notice [QUORUM] Members[0]:
Feb 14 09:08:23 notice [SERV ] Service engine loaded: corosync vote quorum service v1.0 [5]
Feb 14 09:08:23 notice [SERV ] Service engine loaded: corosync cluster quorum service v0.1 [3]
Feb 14 09:08:23 notice [TOTEM ] A processor joined or left the membership and a new membership (192.168.132.11:4) was formed.
Feb 14 09:08:23 notice [QUORUM] Members[1]: 193243328
Feb 14 09:08:23 notice [MAIN ] Completed service synchronization, ready to provide service.
|
The main docs about drbd install are here:
http://www.drbd.org/users-guide/s-build-deb.html
Install dependences:
1 2 | sudo aptitude install dpkg-dev fakeroot debhelper debconf-utils docbook-xml \
docbook-xsl dpatch flex xsltproc module-assistant
|
Clone git drbd project:
1 | git clone git://git.drbd.org/drbd-8.4.git
|
Steps:
Write configuration /etc/drbd.d/ha.res:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 | resource ha0 {
net {
protocol C;
allow-two-primaries yes;
after-sb-0pri discard-zero-changes;
after-sb-1pri discard-secondary;
after-sb-2pri disconnect;
sndbuf-size 0;
max-buffers 4000;
max-epoch-size 4000;
}
startup {
become-primary-on both;
}
disk {
# size max-bio-bvecs on-io-error fencing disk-barrier disk-flushes
# disk-drain md-flushes resync-rate resync-after al-extents
# c-plan-ahead c-delay-target c-fill-target c-max-rate
# c-min-rate disk-timeout
fencing resource-only;
}
handlers {
pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f";
# fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
# split-brain "/usr/lib/drbd/notify-split-brain.sh root";
# out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root";
before-resync-target "/usr/lib/drbd/snapshot-resync-target-lvm.sh -p 15 -- -c 16k";
after-resync-target "/usr/lib/drbd/unsnapshot-resync-target-lvm.sh";
#fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
#after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";
}
volume 0 {
device /dev/drbd0;
disk /dev/vgha/lvdrbd0;
meta-disk internal;
}
volume 1 {
device /dev/drbd1;
disk /dev/vgha/lvdrbd1;
meta-disk internal;
}
volume 2 {
device /dev/drbd2;
disk /dev/vgha/lvdrbd2;
meta-disk internal;
}
on fiesta-ha1 {
address 192.168.132.11:7789;
}
on fiesta-ha2 {
address 192.168.132.12:7789;
}
}
|
Initial config:
1 2 3 | sudo drbdadm create-md resource
sudo drbdadm up resource
sudo drbdadm primary --force resource
|
Main guide: http://www.drbd.org/users-guide/s-lvm-add-pv.html
As first step, we will put cluster on maintenance mode:
1 | crm configure property maintenance-mode=true
|
On both nodes:
1 2 | drbdadm adjust ha0 -d
drbdadm adjust ha0
|
It gives an error. Run:
1 2 | drbdmeta 2 v08 /dev/vg00/lvdrbd2 internal create-md
drbdadm adjust ha0 -d
|
The dlm_controld code comes with no configure scripts. So in order to make things work, we have patched the Makefiles to work with our /opt/ha settings. See patch here: https://gist.github.com/bercab/4951303
Build and install:
1 2 3 4 5 6 | git clone http://git.fedorahosted.org/git/dlm.git
cd dlm
wget https://gist.github.com/bercab/4951303/raw/71c9a5846099b72e2b208f300168c60b1421cf65/dlm.patch
patch -p1 < dlm.patch
make
sudo make install
|
Parameters can be passed to dlm_controld in two ways:
In case (2), pacemaker primitives for controld can use arguments. However, we prefeer to put these parameters in /etc/dlm/dlm.conf:
1 2 3 4 5 6 7 8 9 10 11 | enable_quorum_lockspace=0
log_debug=1
debug_logfile=1
# other disabled options, useful for debugging:
#daemon_debug=1
#foreground=1
#enable_startup_fencing=0
#enable_fencing=0
#enable_quorum_fencing=0
#fence_all dlm_stonith
|
note: Current docs abut using kernel dlm_controld are using -q primitive (or enable_quorum_fencing in dlm.conf). As we are using correct setting of two node cluster on our corosync.conf, we shouldn’t need this parameter.
For reference, use the manual page, there is much more info than googling:
dlm_controld will be called by controld resource agent. But in order to test it, can be a good practice to try to run it on command line, and debug possible errors.
As previous steps, we should have dlm module installed, configfs mounted and corosync running. Corosync is needed because dlm will use corosync as message layer:
1 2 3 4 | sudo modprobe configfs
sudo modprobe dlm
sudo mount -t configfs none /sys/kernel/config
sudo service corosync start
|
run dlm_controld --foreground --daemon_debug. Result should be like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | root@fiesta-ha1:~# dlm_controld --foreground --daemon_debug
64545 config file log_debug = 1 cli_set 0 use 1
64545 config file debug_logfile = 1 cli_set 0 use 1
64545 dlm_controld 4.0.0 started
64545 our_nodeid 193243328
64545 found /dev/misc/dlm-control minor 57
64545 found /dev/misc/dlm-monitor minor 56
64545 found /dev/misc/dlm_plock minor 55
64545 /sys/kernel/config/dlm/cluster/comms: opendir failed: 2
64545 /sys/kernel/config/dlm/cluster/spaces: opendir failed: 2
64545 set log_debug 1
64545 set recover_callbacks 1
64545 cmap totem.cluster_name = 'fiestaha'
64545 set cluster_name fiestaha
64545 /dev/misc/dlm-monitor fd 10
64545 cluster quorum 1 seq 4 nodes 1
64545 cluster node 193243328 added seq 4
64545 set_configfs_node 193243328 192.168.132.11 local 1
64545 cpg_join dlm:controld ...
64545 setup_cpg_daemon 12
64545 dlm:controld conf 1 1 0 memb 193243328 join 193243328 left
64545 fence work wait for cluster ringid
64545 dlm:controld ring 193243328:4 1 memb 193243328
64545 fence_in_progress_unknown 0 startup
64545 receive_protocol 193243328 max 3.1.1.0 run 0.0.0.0
64545 daemon node 193243328 prot max 0.0.0.0 run 0.0.0.0
64545 daemon node 193243328 save max 3.1.1.0 run 0.0.0.0
64545 set_protocol member_count 1 propose daemon 3.1.1 kernel 1.1.1
64545 receive_protocol 193243328 max 3.1.1.0 run 3.1.1.0
64545 daemon node 193243328 prot max 3.1.1.0 run 0.0.0.0
64545 daemon node 193243328 save max 3.1.1.0 run 3.1.1.0
64545 run protocol from nodeid 193243328
64545 daemon run 3.1.1 max 3.1.1 kernel run 1.1.1 max 1.1.1
64545 plocks 13
64545 receive_protocol 193243328 max 3.1.1.0 run 3.1.1.0
|
At this time corosync should give something like this:
1 2 3 4 5 | Feb 14 10:54:20 debug [QUORUM] lib_init_fn: conn=0x7f286c87b4f0
Feb 14 10:54:20 debug [QUORUM] got quorum_type request on 0x7f286c87b4f0
Feb 14 10:54:20 debug [QUORUM] got trackstart request on 0x7f286c87b4f0
Feb 14 10:54:20 debug [QUORUM] sending initial status to 0x7f286c87b4f0
Feb 14 10:54:20 debug [QUORUM] sending quorum notification to 0x7f286c87b4f0, length = 52
|
TODO: start drbd and mount filesystem
We should be able to start drbd and mount filesystem:
1 2 | service drbd start
mount -t gfs2 /dev/drbd1 /mnt/fiestacomm
|
On syslog, you should see something like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | Feb 14 10:59:54 fiesta-ha1 dlm_controld[23971]: 70232 dlm_controld 4.0.0 started
Feb 14 11:06:12 fiesta-ha1 kernel: [70610.395327] GFS2: fsid=fiestaha:commfs: Trying to join cluster "lock_dlm", "fiestaha:commfs"
Feb 14 11:06:12 fiesta-ha1 kernel: [70610.395484] dlm: Using TCP for communications
Feb 14 11:06:12 fiesta-ha1 kernel: [70610.399774] dlm: commfs: dlm_recover 1
Feb 14 11:06:12 fiesta-ha1 kernel: [70610.399810] dlm: commfs: add member 193243328
Feb 14 11:06:12 fiesta-ha1 kernel: [70610.399812] dlm: commfs: dlm_recover_members 1 nodes
Feb 14 11:06:12 fiesta-ha1 kernel: [70610.399815] dlm: commfs: generation 1 slots 1 1:193243328
Feb 14 11:06:12 fiesta-ha1 kernel: [70610.399817] dlm: commfs: dlm_recover_directory
Feb 14 11:06:12 fiesta-ha1 kernel: [70610.399834] dlm: commfs: dlm_recover_directory 0 entries
Feb 14 11:06:12 fiesta-ha1 kernel: [70610.399837] dlm: commfs: dlm_callback_resume 0
Feb 14 11:06:12 fiesta-ha1 kernel: [70610.399853] dlm: commfs: dlm_recover 1 generation 1 done: 0 ms
Feb 14 11:06:12 fiesta-ha1 kernel: [70610.399877] dlm: commfs: joining the lockspace group...
Feb 14 11:06:12 fiesta-ha1 kernel: [70610.399878] dlm: commfs: group event done 0 0
Feb 14 11:06:12 fiesta-ha1 kernel: [70610.399878] dlm: commfs: join complete
Feb 14 11:06:12 fiesta-ha1 kernel: [70610.900053] GFS2: fsid=fiestaha:commfs: first mounter control generation 0
Feb 14 11:06:12 fiesta-ha1 kernel: [70610.900056] GFS2: fsid=fiestaha:commfs: Joined cluster. Now mounting FS...
Feb 14 11:06:12 fiesta-ha1 kernel: [70610.918241] GFS2: fsid=fiestaha:commfs.0: jid=0, already locked for use
Feb 14 11:06:12 fiesta-ha1 kernel: [70610.918243] GFS2: fsid=fiestaha:commfs.0: jid=0: Looking at journal...
Feb 14 11:06:12 fiesta-ha1 kernel: [70610.920294] GFS2: fsid=fiestaha:commfs.0: jid=0: Done
Feb 14 11:06:12 fiesta-ha1 kernel: [70610.920358] GFS2: fsid=fiestaha:commfs.0: jid=1: Trying to acquire journal lock...
Feb 14 11:06:12 fiesta-ha1 kernel: [70610.920556] GFS2: fsid=fiestaha:commfs.0: jid=1: Looking at journal...
Feb 14 11:06:12 fiesta-ha1 kernel: [70610.926916] GFS2: fsid=fiestaha:commfs.0: jid=1: Done
Feb 14 11:06:12 fiesta-ha1 kernel: [70610.926968] GFS2: fsid=fiestaha:commfs.0: first mount done, others may mount
|
Instalar gfs2-utils de git:
1 2 3 4 5 | http://git.fedorahosted.org/git/gfs2-utils.git
./autogen.sh
./configure --prefix=$PREFIX
make
sudo make install
|
Prepare GFS2 Filesystems:
todo:: explain parameters of mkfs.gfs2
Create filesystem: sudo mkfs.gfs2 -p lock_dlm -j 2 -t fiestaha:commfs /dev/drbd1:
1 2 3 4 5 6 7 8 9 10 11 12 13 | root@fiesta-ha1:~# mkfs.gfs2 -p lock_dlm -j 2 -t fiestaha:commfs /dev/drbd1
This will destroy any data on /dev/drbd1.
It appears to contain: data
Are you sure you want to proceed? [y/n]y
Device: /dev/drbd1
Block size: 4096
Device size: 1.00 GB (262127 blocks)
Filesystem size: 1.00 GB (262125 blocks)
Journals: 2
Resource groups: 4
Locking protocol: "lock_dlm"
Lock table: "fiestaha:commfs"
UUID: 1074a7a0-3498-4553-09f7-97e4b5d95def
|
Primitives:
1 2 3 4 5 6 | primitive p_fscomm ocf:heartbeat:Filesystem \
params device="/dev/drbd/by-res/ha0/0" \
directory="/opt/fiestacomm" fstype="gfs2"
clone fscomm_clone p_fscomm
colocation fscomm_ondrbd inf: fscomm_clone ms_drbd_ha0:Master
order fscomm_after_dlm_drbd inf: ms_drbd_ha0:promote dlm_clone:start fscomm_clone:start
|