.. highlight:: sh :linenothreshold: 1 =========================== Ubuntu cluster from scratch =========================== Introduction ============ The main objective is to build a corosync cluster based on corosync 2.0, or what Andrew Beekhof calls *option 3*: Everyone talks to corosync 2.0 In order to clarify current possible corosync architectures, you may want to read this post from Andrew: http://theclusterguy.clusterlabs.org/post/34604901720/pacemaker-and-cluster-filesystems We want to do it on top of ubuntu (Current LTS is 12.04, precise). Current ubuntu cluster stack is based on *option 2*: Everyone talks to CMAN. This is the main source for Ubuntu cluster status: https://wiki.ubuntu.com/ClusterStack/Precise While option 2 is the safest approach for a couple of years, we whant to build on *option 3*, as this is the future option, architecturaly superior to option 1 and option 2. So this document is about building a *option 2* cluster on top of ubuntu precise. TODO: link main refefences of doc: http://clusterlabs.org/doc/ TODO: explain current problem with OCFS2 TODO: explain GFS2 cluster filesystem needs and issues Cluster Components ================== TODO: intro an explain base components and extra components for gfs2 Main Cluster components ------------------------------------------------------------- * cluster-glue * resource-agents * libqb * corosync * pacemaker * dlm Other requirements ------------------ * kernel 3.4 (for dlm controld locking) * drbd * gfs2 Build ===== **Important note**: In order to prevent compile errors, you must start in a **clean environment**, with no previous versions of the components that you are building installed on your system. So be sure of making these checks before continuing: - Remove all deb packages installed related to cluster on your system: cman, corosync, pacemaker, libqb, cluster-glue, etc. - If you are compiling a previously installed package (as an example, you are compiling a git updated corosync and you have the previous corosync installed), the old includes and libs can make you have lots of compiling errors, because you are using old installed includes. Continuing with corosync example, you should remove ``$PREFIX/include/corosync`` before compiling new corosync version. Preparing environment --------------------- Create a script to set your environ variables: ``$HOME/exports.sh`` :: #!/bin/sh export PREFIX=/opt/ha export PKG_CONFIG_PATH=$PREFIX/lib/pkgconfig export LCRSODIR=$PREFIX/libexec/lcrso export LDFLAGS=-L$PREFIX/lib export CPPFLAGS=-I$PREFIX/include export CFLAGS=-I$PREFIX/include export CLUSTER_USER=hacluster export CLUSTER_GROUP=haclient Before build components, do always: ``source $HOME/exports.sh`` Create hacluster user and haclient group. This example uses *uid* and *gid* 120, but you can choose any free one:: sudo groupadd -g 120 haclient sudo useradd -u 120 -g 120 -s /bin/false -d /usr/lib/heartbeat hacluster Install needed development packages:: sudo aptitude install build-essential git mercurial sudo aptitude install autogen automake libtool pkg-config groff autopoint bison bison-dev sudo aptitude install libncurses-dev libreadline-dev sudo aptitude install libaio-dev libglib2.0-dev libxml2-dev libbz2-dev uuid-dev sudo aptitude install libnss3-dev libxslt-dev # for gfs2-utils: sudo aptitude install libblkid-dev check # crmsh sudo aptitude install python-lxml cluster-glue ------------- cluster-glue is the base glue for corosync and pacemaker. It sets development headers with variables about cluster environment, like cluster common paths. As an example, creates the include file ``include/heartbeat/glue-config.h``, with common definitions for components like corosync and pacemaker. Build and install procedure:: hg clone http://hg.linux-ha.org/glue ./autogen.sh ./configure --prefix=$PREFIX --with-daemon-user=${CLUSTER_USER} \ --with-daemon-group=${CLUSTER_GROUP} --enable-fatal-warnings=no \ --with-ocf-root=$PREFIX/usr/lib/ocf make sudo make install libqb ------- clone and build:: git clone https://github.com/asalkeld/libqb.git cd libqb ./autogen.sh ./configure --prefix=$PREFIX make sudo make install resource-agents --------------- clone and build:: git clone git://github.com/ClusterLabs/resource-agents cd resource-agents ./autogen.sh && ./configure --prefix=$PREFIX make sudo make install If we install future resource agents in system standar dir, we want it to be written in our OCF dir:: sudo ln -s /opt/ha/usr/lib/ocf /usr/lib/ocf corosync -------- Clone, build and install:: git clone git://github.com/corosync/corosync.git ./autogen.sh ./configure --prefix=$PREFIX make Pacemaker --------- Clone, build and install:: git clone git://github.com/ClusterLabs/pacemaker.git cd pacemaker ./autogen.sh ./configure --prefix=$PREFIX --without-cman \ --without-heartbeat --with-corosync \ --enable-fatal-warnings=no --with-lcrso-dir=$LCRSODIR make sudo make install Configure script should give a result like this: .. code-block:: none Version = 1.1.8 (Build: f94e1e4) Features = libqb-logging libqb-ipc lha-fencing upstart systemd nagios corosync-native Prefix = /opt/ha Executables = /opt/ha/sbin Man pages = /opt/ha/share/man Libraries = /opt/ha/lib Header files = /opt/ha/include Arch-independent files = /opt/ha/share State information = /opt/ha/var System configuration = /opt/ha/etc Corosync Plugins = /opt/ha/lib Use system LTDL = yes HA group name = haclient HA user name = hacluster CFLAGS = -I/opt/ha/include -I/opt/ha/include -I/opt/ha/include/heartbeat -I/opt/ha/include -I/opt/ha/include -ggdb -fgnu89-inline -fstack-protector-all -Wall -Waggregate-return -Wbad-function-cast -Wcast-align -Wdeclaration-after-statement -Wendif-labels -Wfloat-equal -Wformat=2 -Wformat-security -Wformat-nonliteral -Wmissing-prototypes -Wmissing-declarations -Wnested-externs -Wno-long-long -Wno-strict-aliasing -Wunused-but-set-variable -Wpointer-arith -Wstrict-prototypes -Wwrite-strings Libraries = -lcorosync_common -lplumb -lpils -lqb -lbz2 -lxslt -lxml2 -lc -lglib-2.0 -lglib-2.0 -luuid -lrt -ldl -lglib-2.0 -lltdl -L/opt/ha/lib -lqb -ldl -lrt -lpthread Stack Libraries = -L/opt/ha/lib -lqb -ldl -lrt -lpthread -L/opt/ha/lib -lcpg -L/opt/ha/lib -lcfg -L/opt/ha/lib -lcmap -L/opt/ha/lib -lquorum crmsh ----- :: hg clone http://hg.savannah.nongnu.org/hgweb/crmsh/ cd crmsh ./autogen ./configure --prefix=$PREFIX make sudo make install Cluster configuration ===================== This is a fast and basic cluster configuration. Put these environ variables at the end of your ``.bashrc`` (both your user and root):: export PATH=/opt/ha/sbin:$PATH export MANPATH=/op/ha/share/man:$MANPATH export PYTHONPATH=/opt/ha/lib/python2.7/site-packages Copy corosync init script to init.d:: sudo cp /opt/ha/etc/init.d/corosync /etc/init.d/corosync Create config file:: totem { version: 2 crypto_cipher: none crypto_hash: none cluster_name: fiestaha interface { # Rings must be consecutively numbered, starting at 0. ringnumber: 0 ttl: 1 bindnetaddr: 192.168.132.11 mcastaddr: 226.94.1.1 mcastport: 5405 } } logging { fileline: off to_stderr: yes to_logfile: no to_syslog: yes syslog_facility: local7 # Log debug messages (very verbose). When in doubt, leave off. debug: off timestamp: on logger_subsys { subsys: QUORUM debug: off } } quorum { provider: corosync_votequorum expected_votes: 2 two_node: 1 wait_for_all: 0 } Start cluster services ---------------------- TODO: good rsyslog config On /etc/rsyslog.d/50-debian-default:: local7.* /var/log/corosync.log *.*;auth,authpriv.none;local7.none -/var/log/syslog Start cluster services:: sudo service corosync start sudo service pacemaker start Logs should be on /var/log/corosync.log, using rsyslog daemon. We can see cluster config wiht crm: ``sudo -i crm configure show``:: bercab@fiesta-ha1:/usr/src$ sudo -i crm configure show node $id="193243328" fiesta-ha1 property $id="cib-bootstrap-options" \ dc-version="1.1.8-f94e1e4" \ cluster-infrastructure="corosync" \ stonith-enabled="false" TODO: show crm_mon Corosync output: ``corosync -f``:: Feb 14 09:08:23 notice [MAIN ] Corosync Cluster Engine ('2.3.0.3-6617'): started and ready to provide service. Feb 14 09:08:23 info [MAIN ] Corosync built-in features: pie relro bindnow Feb 14 09:08:23 notice [TOTEM ] Initializing transport (UDP/IP Multicast). Feb 14 09:08:23 notice [TOTEM ] Initializing transmit/receive security (NSS) crypto: none hash: none Feb 14 09:08:23 notice [TOTEM ] The network interface [192.168.132.11] is now up. Feb 14 09:08:23 notice [SERV ] Service engine loaded: corosync configuration map access [0] Feb 14 09:08:23 notice [SERV ] Service engine loaded: corosync configuration service [1] Feb 14 09:08:23 notice [SERV ] Service engine loaded: corosync cluster closed process group service v1.01 [2] Feb 14 09:08:23 notice [SERV ] Service engine loaded: corosync profile loading service [4] Feb 14 09:08:23 notice [QUORUM] Using quorum provider corosync_votequorum Feb 14 09:08:23 notice [QUORUM] This node is within the primary component and will provide service. Feb 14 09:08:23 notice [QUORUM] Members[0]: Feb 14 09:08:23 notice [SERV ] Service engine loaded: corosync vote quorum service v1.0 [5] Feb 14 09:08:23 notice [SERV ] Service engine loaded: corosync cluster quorum service v0.1 [3] Feb 14 09:08:23 notice [TOTEM ] A processor joined or left the membership and a new membership (192.168.132.11:4) was formed. Feb 14 09:08:23 notice [QUORUM] Members[1]: 193243328 Feb 14 09:08:23 notice [MAIN ] Completed service synchronization, ready to provide service. Part 2: Cluster Filesystems: DRBD, DLM, GFS2 ============================================================================== DRBD 8.4 -------- The main docs about drbd install are here: http://www.drbd.org/users-guide/s-build-deb.html Install dependences:: sudo aptitude install dpkg-dev fakeroot debhelper debconf-utils docbook-xml \ docbook-xsl dpatch flex xsltproc module-assistant Clone git drbd project:: git clone git://git.drbd.org/drbd-8.4.git Steps: 1. dpkg-buildpackage -rfakeroot -b -uc 2. sudo dpkg -i drbd8-utils_8.4.3-0_amd64.deb drbd8-module-source_8.4.3-0_all.deb 3. module-assistant auto-install drbd8 Write configuration ``/etc/drbd.d/ha.res``:: resource ha0 { net { protocol C; allow-two-primaries yes; after-sb-0pri discard-zero-changes; after-sb-1pri discard-secondary; after-sb-2pri disconnect; sndbuf-size 0; max-buffers 4000; max-epoch-size 4000; } startup { become-primary-on both; } disk { # size max-bio-bvecs on-io-error fencing disk-barrier disk-flushes # disk-drain md-flushes resync-rate resync-after al-extents # c-plan-ahead c-delay-target c-fill-target c-max-rate # c-min-rate disk-timeout fencing resource-only; } handlers { pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f"; pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f"; local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f"; # fence-peer "/usr/lib/drbd/crm-fence-peer.sh"; # split-brain "/usr/lib/drbd/notify-split-brain.sh root"; # out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root"; before-resync-target "/usr/lib/drbd/snapshot-resync-target-lvm.sh -p 15 -- -c 16k"; after-resync-target "/usr/lib/drbd/unsnapshot-resync-target-lvm.sh"; #fence-peer "/usr/lib/drbd/crm-fence-peer.sh"; #after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh"; } volume 0 { device /dev/drbd0; disk /dev/vgha/lvdrbd0; meta-disk internal; } volume 1 { device /dev/drbd1; disk /dev/vgha/lvdrbd1; meta-disk internal; } volume 2 { device /dev/drbd2; disk /dev/vgha/lvdrbd2; meta-disk internal; } on fiesta-ha1 { address 192.168.132.11:7789; } on fiesta-ha2 { address 192.168.132.12:7789; } } Initial config: :: sudo drbdadm create-md resource sudo drbdadm up resource sudo drbdadm primary --force resource DRBD create a new volume ------------------------ Main guide: http://www.drbd.org/users-guide/s-lvm-add-pv.html As first step, we will put cluster on maintenance mode:: crm configure property maintenance-mode=true On both nodes: :: drbdadm adjust ha0 -d drbdadm adjust ha0 It gives an error. Run:: drbdmeta 2 v08 /dev/vg00/lvdrbd2 internal create-md drbdadm adjust ha0 -d DLM --- The dlm_controld code comes with no configure scripts. So in order to make things work, we have patched the Makefiles to work with our /opt/ha settings. See patch here: https://gist.github.com/bercab/4951303 Build and install:: git clone http://git.fedorahosted.org/git/dlm.git cd dlm wget https://gist.github.com/bercab/4951303/raw/71c9a5846099b72e2b208f300168c60b1421cf65/dlm.patch patch -p1 < dlm.patch make sudo make install Parameters can be passed to dlm_controld in two ways: 1. config file 2. command line. In case (2), pacemaker primitives for controld can use arguments. However, we prefeer to put these parameters in ``/etc/dlm/dlm.conf``:: enable_quorum_lockspace=0 log_debug=1 debug_logfile=1 # other disabled options, useful for debugging: #daemon_debug=1 #foreground=1 #enable_startup_fencing=0 #enable_fencing=0 #enable_quorum_fencing=0 #fence_all dlm_stonith **note**: Current docs abut using kernel dlm_controld are using ``-q`` primitive (or ``enable_quorum_fencing`` in ``dlm.conf``). As we are using correct setting of *two node cluster* on our ``corosync.conf``, we shouldn't need this parameter. For reference, use the manual page, there is much more info than googling: - man dlm_controld - man dlm.conf Test dlm_controld ~~~~~~~~~~~~~~~~~~~~~~~~~ dlm_controld will be called by **controld** resource agent. But in order to test it, can be a good practice to try to run it on command line, and debug possible errors. As previous steps, we should have dlm module installed, configfs mounted and corosync running. Corosync is needed because dlm will use corosync as message layer:: sudo modprobe configfs sudo modprobe dlm sudo mount -t configfs none /sys/kernel/config sudo service corosync start run ``dlm_controld --foreground --daemon_debug``. Result should be like this:: root@fiesta-ha1:~# dlm_controld --foreground --daemon_debug 64545 config file log_debug = 1 cli_set 0 use 1 64545 config file debug_logfile = 1 cli_set 0 use 1 64545 dlm_controld 4.0.0 started 64545 our_nodeid 193243328 64545 found /dev/misc/dlm-control minor 57 64545 found /dev/misc/dlm-monitor minor 56 64545 found /dev/misc/dlm_plock minor 55 64545 /sys/kernel/config/dlm/cluster/comms: opendir failed: 2 64545 /sys/kernel/config/dlm/cluster/spaces: opendir failed: 2 64545 set log_debug 1 64545 set recover_callbacks 1 64545 cmap totem.cluster_name = 'fiestaha' 64545 set cluster_name fiestaha 64545 /dev/misc/dlm-monitor fd 10 64545 cluster quorum 1 seq 4 nodes 1 64545 cluster node 193243328 added seq 4 64545 set_configfs_node 193243328 192.168.132.11 local 1 64545 cpg_join dlm:controld ... 64545 setup_cpg_daemon 12 64545 dlm:controld conf 1 1 0 memb 193243328 join 193243328 left 64545 fence work wait for cluster ringid 64545 dlm:controld ring 193243328:4 1 memb 193243328 64545 fence_in_progress_unknown 0 startup 64545 receive_protocol 193243328 max 3.1.1.0 run 0.0.0.0 64545 daemon node 193243328 prot max 0.0.0.0 run 0.0.0.0 64545 daemon node 193243328 save max 3.1.1.0 run 0.0.0.0 64545 set_protocol member_count 1 propose daemon 3.1.1 kernel 1.1.1 64545 receive_protocol 193243328 max 3.1.1.0 run 3.1.1.0 64545 daemon node 193243328 prot max 3.1.1.0 run 0.0.0.0 64545 daemon node 193243328 save max 3.1.1.0 run 3.1.1.0 64545 run protocol from nodeid 193243328 64545 daemon run 3.1.1 max 3.1.1 kernel run 1.1.1 max 1.1.1 64545 plocks 13 64545 receive_protocol 193243328 max 3.1.1.0 run 3.1.1.0 At this time corosync should give something like this:: Feb 14 10:54:20 debug [QUORUM] lib_init_fn: conn=0x7f286c87b4f0 Feb 14 10:54:20 debug [QUORUM] got quorum_type request on 0x7f286c87b4f0 Feb 14 10:54:20 debug [QUORUM] got trackstart request on 0x7f286c87b4f0 Feb 14 10:54:20 debug [QUORUM] sending initial status to 0x7f286c87b4f0 Feb 14 10:54:20 debug [QUORUM] sending quorum notification to 0x7f286c87b4f0, length = 52 TODO: start drbd and mount filesystem We should be able to start drbd and mount filesystem:: service drbd start mount -t gfs2 /dev/drbd1 /mnt/fiestacomm On syslog, you should see something like this:: Feb 14 10:59:54 fiesta-ha1 dlm_controld[23971]: 70232 dlm_controld 4.0.0 started Feb 14 11:06:12 fiesta-ha1 kernel: [70610.395327] GFS2: fsid=fiestaha:commfs: Trying to join cluster "lock_dlm", "fiestaha:commfs" Feb 14 11:06:12 fiesta-ha1 kernel: [70610.395484] dlm: Using TCP for communications Feb 14 11:06:12 fiesta-ha1 kernel: [70610.399774] dlm: commfs: dlm_recover 1 Feb 14 11:06:12 fiesta-ha1 kernel: [70610.399810] dlm: commfs: add member 193243328 Feb 14 11:06:12 fiesta-ha1 kernel: [70610.399812] dlm: commfs: dlm_recover_members 1 nodes Feb 14 11:06:12 fiesta-ha1 kernel: [70610.399815] dlm: commfs: generation 1 slots 1 1:193243328 Feb 14 11:06:12 fiesta-ha1 kernel: [70610.399817] dlm: commfs: dlm_recover_directory Feb 14 11:06:12 fiesta-ha1 kernel: [70610.399834] dlm: commfs: dlm_recover_directory 0 entries Feb 14 11:06:12 fiesta-ha1 kernel: [70610.399837] dlm: commfs: dlm_callback_resume 0 Feb 14 11:06:12 fiesta-ha1 kernel: [70610.399853] dlm: commfs: dlm_recover 1 generation 1 done: 0 ms Feb 14 11:06:12 fiesta-ha1 kernel: [70610.399877] dlm: commfs: joining the lockspace group... Feb 14 11:06:12 fiesta-ha1 kernel: [70610.399878] dlm: commfs: group event done 0 0 Feb 14 11:06:12 fiesta-ha1 kernel: [70610.399878] dlm: commfs: join complete Feb 14 11:06:12 fiesta-ha1 kernel: [70610.900053] GFS2: fsid=fiestaha:commfs: first mounter control generation 0 Feb 14 11:06:12 fiesta-ha1 kernel: [70610.900056] GFS2: fsid=fiestaha:commfs: Joined cluster. Now mounting FS... Feb 14 11:06:12 fiesta-ha1 kernel: [70610.918241] GFS2: fsid=fiestaha:commfs.0: jid=0, already locked for use Feb 14 11:06:12 fiesta-ha1 kernel: [70610.918243] GFS2: fsid=fiestaha:commfs.0: jid=0: Looking at journal... Feb 14 11:06:12 fiesta-ha1 kernel: [70610.920294] GFS2: fsid=fiestaha:commfs.0: jid=0: Done Feb 14 11:06:12 fiesta-ha1 kernel: [70610.920358] GFS2: fsid=fiestaha:commfs.0: jid=1: Trying to acquire journal lock... Feb 14 11:06:12 fiesta-ha1 kernel: [70610.920556] GFS2: fsid=fiestaha:commfs.0: jid=1: Looking at journal... Feb 14 11:06:12 fiesta-ha1 kernel: [70610.926916] GFS2: fsid=fiestaha:commfs.0: jid=1: Done Feb 14 11:06:12 fiesta-ha1 kernel: [70610.926968] GFS2: fsid=fiestaha:commfs.0: first mount done, others may mount GFS2-utils ---------- Instalar gfs2-utils de git:: http://git.fedorahosted.org/git/gfs2-utils.git ./autogen.sh ./configure --prefix=$PREFIX make sudo make install Prepare GFS2 Filesystems: todo:: explain parameters of mkfs.gfs2 Create filesystem: ``sudo mkfs.gfs2 -p lock_dlm -j 2 -t fiestaha:commfs /dev/drbd1``:: root@fiesta-ha1:~# mkfs.gfs2 -p lock_dlm -j 2 -t fiestaha:commfs /dev/drbd1 This will destroy any data on /dev/drbd1. It appears to contain: data Are you sure you want to proceed? [y/n]y Device: /dev/drbd1 Block size: 4096 Device size: 1.00 GB (262127 blocks) Filesystem size: 1.00 GB (262125 blocks) Journals: 2 Resource groups: 4 Locking protocol: "lock_dlm" Lock table: "fiestaha:commfs" UUID: 1074a7a0-3498-4553-09f7-97e4b5d95def Primitives:: primitive p_fscomm ocf:heartbeat:Filesystem \ params device="/dev/drbd/by-res/ha0/0" \ directory="/opt/fiestacomm" fstype="gfs2" clone fscomm_clone p_fscomm colocation fscomm_ondrbd inf: fscomm_clone ms_drbd_ha0:Master order fscomm_after_dlm_drbd inf: ms_drbd_ha0:promote dlm_clone:start fscomm_clone:start