.. highlight:: sh
   :linenothreshold: 1


===========================
Ubuntu cluster from scratch
===========================


Introduction
============

The main objective is to build a corosync cluster based on corosync 2.0,
or what Andrew Beekhof calls *option 3*: Everyone talks to corosync 2.0 

In order to clarify current possible corosync architectures,
you may want to read this post from Andrew:

http://theclusterguy.clusterlabs.org/post/34604901720/pacemaker-and-cluster-filesystems

We want to do it on top of ubuntu (Current LTS is 12.04, precise).  Current
ubuntu cluster stack is based on *option 2*: Everyone talks to CMAN.  This is
the main source for Ubuntu cluster status:
https://wiki.ubuntu.com/ClusterStack/Precise

While option 2 is the safest approach for a couple of years, we whant to build
on *option 3*, as this is the future option, architecturaly superior to option
1 and option 2.

So this document is about building a *option 2* cluster on top of ubuntu
precise.


TODO: link main refefences of doc: http://clusterlabs.org/doc/

TODO: explain current problem with OCFS2

TODO: explain GFS2 cluster filesystem needs and issues


Cluster Components
==================

TODO: intro an explain base components and extra components for gfs2


Main Cluster components 
-------------------------------------------------------------

* cluster-glue
* resource-agents
* libqb
* corosync
* pacemaker
* dlm


Other requirements
------------------

* kernel 3.4 (for dlm controld locking)
* drbd
* gfs2


Build
=====

**Important note**: In order to prevent compile errors, you must start in a **clean
environment**, with no previous versions of the components that you are building
installed on your system. So be sure of making these checks before continuing:

- Remove all deb packages installed related to cluster on your system:  cman, 
  corosync, pacemaker, libqb, cluster-glue, etc.
- If you are compiling a previously installed package (as an example, you are 
  compiling a git updated corosync and you have the previous corosync installed), 
  the old includes and libs can make you have lots of compiling errors, because 
  you are using old installed includes. Continuing with corosync example, you 
  should remove ``$PREFIX/include/corosync`` before compiling new corosync version.


Preparing environment
---------------------

Create a script to set your environ variables: ``$HOME/exports.sh`` ::

  #!/bin/sh
  export PREFIX=/opt/ha
  export PKG_CONFIG_PATH=$PREFIX/lib/pkgconfig
  export LCRSODIR=$PREFIX/libexec/lcrso 
  export LDFLAGS=-L$PREFIX/lib
  export CPPFLAGS=-I$PREFIX/include
  export CFLAGS=-I$PREFIX/include
  export CLUSTER_USER=hacluster
  export CLUSTER_GROUP=haclient

Before build components, do always: ``source $HOME/exports.sh``

Create hacluster user and haclient group. This example uses *uid* and *gid* 120, 
but you can choose any free one::

    sudo groupadd -g 120 haclient
    sudo useradd -u 120 -g 120 -s /bin/false -d /usr/lib/heartbeat hacluster


Install needed development packages::

    sudo aptitude install build-essential git mercurial
    sudo aptitude install autogen automake libtool pkg-config groff autopoint bison bison-dev
    sudo aptitude install libncurses-dev libreadline-dev
    sudo aptitude install libaio-dev libglib2.0-dev libxml2-dev libbz2-dev uuid-dev
    sudo aptitude install libnss3-dev libxslt-dev 
    # for gfs2-utils:
    sudo aptitude install libblkid-dev check
    # crmsh
    sudo aptitude install python-lxml


cluster-glue
-------------

cluster-glue is the base glue for corosync and pacemaker. It sets development 
headers with variables about cluster environment, like cluster common paths. 

As an example, creates the include file ``include/heartbeat/glue-config.h``, 
with common definitions for components like corosync and pacemaker.


Build and install procedure::

    hg clone http://hg.linux-ha.org/glue

    ./autogen.sh

    ./configure --prefix=$PREFIX  --with-daemon-user=${CLUSTER_USER} \
    --with-daemon-group=${CLUSTER_GROUP} --enable-fatal-warnings=no \
    --with-ocf-root=$PREFIX/usr/lib/ocf

    make
    sudo make install


libqb
-------

clone and build::

    git clone https://github.com/asalkeld/libqb.git

    cd libqb
    ./autogen.sh
    ./configure --prefix=$PREFIX

    make 
    sudo make install 


resource-agents
---------------

clone and build::

    git clone git://github.com/ClusterLabs/resource-agents
    cd resource-agents
    ./autogen.sh && ./configure --prefix=$PREFIX
    make
    sudo make install

If we install future resource agents in system standar dir, we want it to be 
written in our OCF dir:: 

    sudo ln -s /opt/ha/usr/lib/ocf /usr/lib/ocf

corosync
--------

Clone, build and install::

    git clone git://github.com/corosync/corosync.git
    ./autogen.sh
    ./configure --prefix=$PREFIX
    make

Pacemaker
---------

Clone, build and install::

    git clone git://github.com/ClusterLabs/pacemaker.git
    cd pacemaker
    ./autogen.sh 
    ./configure --prefix=$PREFIX --without-cman \
        --without-heartbeat --with-corosync \
        --enable-fatal-warnings=no --with-lcrso-dir=$LCRSODIR
    make
    sudo make install

Configure script should give a result like this:

.. code-block:: none

    Version                  = 1.1.8 (Build: f94e1e4)
    Features                 = libqb-logging libqb-ipc lha-fencing upstart systemd nagios  corosync-native

    Prefix                   = /opt/ha
    Executables              = /opt/ha/sbin
    Man pages                = /opt/ha/share/man
    Libraries                = /opt/ha/lib
    Header files             = /opt/ha/include
    Arch-independent files   = /opt/ha/share
    State information        = /opt/ha/var
    System configuration     = /opt/ha/etc
    Corosync Plugins         = /opt/ha/lib

    Use system LTDL          = yes

    HA group name            = haclient
    HA user name             = hacluster

    CFLAGS                   = -I/opt/ha/include -I/opt/ha/include -I/opt/ha/include/heartbeat    -I/opt/ha/include   -I/opt/ha/include   -ggdb  -fgnu89-inline -fstack-protector-all -Wall -Waggregate-return -Wbad-function-cast -Wcast-align -Wdeclaration-after-statement -Wendif-labels -Wfloat-equal -Wformat=2 -Wformat-security -Wformat-nonliteral -Wmissing-prototypes -Wmissing-declarations -Wnested-externs -Wno-long-long -Wno-strict-aliasing -Wunused-but-set-variable -Wpointer-arith -Wstrict-prototypes -Wwrite-strings
    Libraries                = -lcorosync_common -lplumb -lpils -lqb -lbz2 -lxslt -lxml2 -lc -lglib-2.0 -lglib-2.0 -luuid -lrt -ldl  -lglib-2.0   -lltdl -L/opt/ha/lib -lqb -ldl -lrt -lpthread  
    Stack Libraries          =   -L/opt/ha/lib -lqb -ldl -lrt -lpthread   -L/opt/ha/lib -lcpg   -L/opt/ha/lib -lcfg   -L/opt/ha/lib -lcmap   -L/opt/ha/lib -lquorum 


crmsh
-----

::

    hg clone http://hg.savannah.nongnu.org/hgweb/crmsh/
    cd crmsh
    ./autogen
    ./configure --prefix=$PREFIX
    make 
    sudo make install


Cluster configuration
=====================

This is a fast and basic cluster configuration.

Put these environ variables at the end of your ``.bashrc`` (both your user and root)::

  export PATH=/opt/ha/sbin:$PATH
  export MANPATH=/op/ha/share/man:$MANPATH
  export PYTHONPATH=/opt/ha/lib/python2.7/site-packages

Copy corosync init script to init.d::

    sudo cp /opt/ha/etc/init.d/corosync /etc/init.d/corosync

Create config file::

    totem {
            version: 2

            crypto_cipher: none
            crypto_hash: none
            cluster_name: fiestaha
            interface {
                    # Rings must be consecutively numbered, starting at 0.
                    ringnumber: 0
                    ttl: 1
                    bindnetaddr: 192.168.132.11
                    mcastaddr: 226.94.1.1
                    mcastport: 5405
            }
    }

    logging {
            fileline: off
            to_stderr: yes
            to_logfile: no
            to_syslog: yes
            syslog_facility: local7
            # Log debug messages (very verbose). When in doubt, leave off.
            debug: off
            timestamp: on
            logger_subsys {
                    subsys: QUORUM
                    debug: off
            }
    }

    quorum {
            provider: corosync_votequorum
            expected_votes: 2
            two_node: 1
            wait_for_all: 0
    }


Start cluster services
----------------------

TODO: good rsyslog config

On /etc/rsyslog.d/50-debian-default::

  local7.* /var/log/corosync.log
  *.*;auth,authpriv.none;local7.none              -/var/log/syslog

Start cluster services::

    sudo service corosync start
    sudo service pacemaker start

Logs should be on /var/log/corosync.log, using rsyslog daemon.

We can see cluster config wiht crm: ``sudo -i crm configure show``::

  bercab@fiesta-ha1:/usr/src$ sudo -i crm configure show
  node $id="193243328" fiesta-ha1
  property $id="cib-bootstrap-options" \
        dc-version="1.1.8-f94e1e4" \
        cluster-infrastructure="corosync" \
        stonith-enabled="false"

TODO: show crm_mon

Corosync output: ``corosync -f``::

    Feb 14 09:08:23 notice  [MAIN  ] Corosync Cluster Engine ('2.3.0.3-6617'): started and ready to provide service.
    Feb 14 09:08:23 info    [MAIN  ] Corosync built-in features: pie relro bindnow
    Feb 14 09:08:23 notice  [TOTEM ] Initializing transport (UDP/IP Multicast).
    Feb 14 09:08:23 notice  [TOTEM ] Initializing transmit/receive security (NSS) crypto: none hash: none
    Feb 14 09:08:23 notice  [TOTEM ] The network interface [192.168.132.11] is now up.
    Feb 14 09:08:23 notice  [SERV  ] Service engine loaded: corosync configuration map access [0]
    Feb 14 09:08:23 notice  [SERV  ] Service engine loaded: corosync configuration service [1]
    Feb 14 09:08:23 notice  [SERV  ] Service engine loaded: corosync cluster closed process group service v1.01 [2]
    Feb 14 09:08:23 notice  [SERV  ] Service engine loaded: corosync profile loading service [4]
    Feb 14 09:08:23 notice  [QUORUM] Using quorum provider corosync_votequorum
    Feb 14 09:08:23 notice  [QUORUM] This node is within the primary component and will provide service.
    Feb 14 09:08:23 notice  [QUORUM] Members[0]:
    Feb 14 09:08:23 notice  [SERV  ] Service engine loaded: corosync vote quorum service v1.0 [5]
    Feb 14 09:08:23 notice  [SERV  ] Service engine loaded: corosync cluster quorum service v0.1 [3]
    Feb 14 09:08:23 notice  [TOTEM ] A processor joined or left the membership and a new membership (192.168.132.11:4) was formed.
    Feb 14 09:08:23 notice  [QUORUM] Members[1]: 193243328
    Feb 14 09:08:23 notice  [MAIN  ] Completed service synchronization, ready to provide service.


Part 2: Cluster Filesystems: DRBD, DLM, GFS2
==============================================================================

DRBD 8.4
--------

The main docs about drbd install are here:

http://www.drbd.org/users-guide/s-build-deb.html

Install dependences::

    sudo aptitude install dpkg-dev fakeroot debhelper debconf-utils docbook-xml \
    docbook-xsl dpatch flex xsltproc module-assistant

Clone git drbd project::

    git clone git://git.drbd.org/drbd-8.4.git

Steps:

1. dpkg-buildpackage -rfakeroot -b -uc
2. sudo dpkg -i drbd8-utils_8.4.3-0_amd64.deb drbd8-module-source_8.4.3-0_all.deb
3. module-assistant auto-install drbd8


Write configuration ``/etc/drbd.d/ha.res``::

    resource ha0 {
    net {
        protocol C;
        allow-two-primaries yes;
        after-sb-0pri discard-zero-changes;
        after-sb-1pri discard-secondary;
        after-sb-2pri disconnect;
        sndbuf-size 0;
        max-buffers 4000;
        max-epoch-size 4000;
    }
    startup {
        become-primary-on both;
    }
    disk {
        # size max-bio-bvecs on-io-error fencing disk-barrier disk-flushes
        # disk-drain md-flushes resync-rate resync-after al-extents
        # c-plan-ahead c-delay-target c-fill-target c-max-rate
        # c-min-rate disk-timeout
        fencing resource-only;
    }
    handlers {
        pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
        pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
        local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f";
        # fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
        # split-brain "/usr/lib/drbd/notify-split-brain.sh root";
        # out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root";
        before-resync-target "/usr/lib/drbd/snapshot-resync-target-lvm.sh -p 15 -- -c 16k";
        after-resync-target "/usr/lib/drbd/unsnapshot-resync-target-lvm.sh";
        #fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
        #after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";
    }
    volume 0 {
        device    /dev/drbd0;
        disk      /dev/vgha/lvdrbd0;
        meta-disk internal;
    }
    volume 1  {
        device    /dev/drbd1;
        disk      /dev/vgha/lvdrbd1;
        meta-disk internal;
    }
    volume 2  {
        device    /dev/drbd2;
        disk      /dev/vgha/lvdrbd2;
        meta-disk internal;
    }
    on fiesta-ha1 {
        address   192.168.132.11:7789;
    }
    on fiesta-ha2 {
        address   192.168.132.12:7789;
    }
    }

Initial config: ::

    sudo drbdadm create-md resource
    sudo drbdadm up resource
    sudo drbdadm primary --force resource


DRBD create a new volume
------------------------
Main guide:
http://www.drbd.org/users-guide/s-lvm-add-pv.html

As first step, we will put cluster on maintenance mode::

    crm configure property maintenance-mode=true

On both nodes: ::

    drbdadm adjust ha0 -d
    drbdadm adjust ha0

It gives an error. Run::

    drbdmeta 2 v08 /dev/vg00/lvdrbd2 internal create-md
    drbdadm adjust ha0 -d


DLM
---

The dlm_controld code comes with no configure scripts. 
So in order to make things work, we have patched the Makefiles to work with our
/opt/ha settings. See patch here: https://gist.github.com/bercab/4951303

Build and install::

    git clone http://git.fedorahosted.org/git/dlm.git
    cd dlm
    wget https://gist.github.com/bercab/4951303/raw/71c9a5846099b72e2b208f300168c60b1421cf65/dlm.patch
    patch -p1 < dlm.patch
    make
    sudo make install

Parameters can be passed to dlm_controld in two ways:

1. config file
2. command line. 

In case (2), pacemaker primitives for controld can use arguments. However, we
prefeer to put these parameters in ``/etc/dlm/dlm.conf``::

    enable_quorum_lockspace=0
    log_debug=1
    debug_logfile=1

    # other disabled options, useful for debugging:
        #daemon_debug=1
        #foreground=1
        #enable_startup_fencing=0
        #enable_fencing=0
        #enable_quorum_fencing=0
        #fence_all dlm_stonith

**note**: Current docs abut using kernel dlm_controld are using ``-q`` primitive
(or ``enable_quorum_fencing`` in ``dlm.conf``). As we are using correct setting 
of *two node cluster* on our ``corosync.conf``, we shouldn't need this parameter. 

For reference, use the manual page, there is much more info than googling:

- man dlm_controld
- man dlm.conf

Test dlm_controld
~~~~~~~~~~~~~~~~~~~~~~~~~

dlm_controld will be called by **controld** resource agent. But in order to 
test it, can be a good practice to try to run it on command line, and debug 
possible errors.

As previous steps, we should have dlm module installed, configfs mounted and 
corosync running. Corosync is needed because dlm will use corosync as message
layer::

    sudo modprobe configfs
    sudo modprobe dlm
    sudo mount -t configfs none /sys/kernel/config
    sudo service corosync start


run ``dlm_controld  --foreground --daemon_debug``. Result should be like this::

    root@fiesta-ha1:~# dlm_controld  --foreground --daemon_debug
    64545 config file log_debug = 1 cli_set 0 use 1
    64545 config file debug_logfile = 1 cli_set 0 use 1
    64545 dlm_controld 4.0.0 started
    64545 our_nodeid 193243328
    64545 found /dev/misc/dlm-control minor 57
    64545 found /dev/misc/dlm-monitor minor 56
    64545 found /dev/misc/dlm_plock minor 55
    64545 /sys/kernel/config/dlm/cluster/comms: opendir failed: 2
    64545 /sys/kernel/config/dlm/cluster/spaces: opendir failed: 2
    64545 set log_debug 1
    64545 set recover_callbacks 1
    64545 cmap totem.cluster_name = 'fiestaha'
    64545 set cluster_name fiestaha
    64545 /dev/misc/dlm-monitor fd 10
    64545 cluster quorum 1 seq 4 nodes 1
    64545 cluster node 193243328 added seq 4
    64545 set_configfs_node 193243328 192.168.132.11 local 1
    64545 cpg_join dlm:controld ...
    64545 setup_cpg_daemon 12
    64545 dlm:controld conf 1 1 0 memb 193243328 join 193243328 left
    64545 fence work wait for cluster ringid
    64545 dlm:controld ring 193243328:4 1 memb 193243328
    64545 fence_in_progress_unknown 0 startup
    64545 receive_protocol 193243328 max 3.1.1.0 run 0.0.0.0
    64545 daemon node 193243328 prot max 0.0.0.0 run 0.0.0.0
    64545 daemon node 193243328 save max 3.1.1.0 run 0.0.0.0
    64545 set_protocol member_count 1 propose daemon 3.1.1 kernel 1.1.1
    64545 receive_protocol 193243328 max 3.1.1.0 run 3.1.1.0
    64545 daemon node 193243328 prot max 3.1.1.0 run 0.0.0.0
    64545 daemon node 193243328 save max 3.1.1.0 run 3.1.1.0
    64545 run protocol from nodeid 193243328
    64545 daemon run 3.1.1 max 3.1.1 kernel run 1.1.1 max 1.1.1
    64545 plocks 13
    64545 receive_protocol 193243328 max 3.1.1.0 run 3.1.1.0

At this time corosync should give something like this::

    Feb 14 10:54:20 debug   [QUORUM] lib_init_fn: conn=0x7f286c87b4f0
    Feb 14 10:54:20 debug   [QUORUM] got quorum_type request on 0x7f286c87b4f0
    Feb 14 10:54:20 debug   [QUORUM] got trackstart request on 0x7f286c87b4f0
    Feb 14 10:54:20 debug   [QUORUM] sending initial status to 0x7f286c87b4f0
    Feb 14 10:54:20 debug   [QUORUM] sending quorum notification to 0x7f286c87b4f0, length = 52

TODO: start drbd and mount filesystem

We should be able to start drbd and mount filesystem::

    service drbd start
    mount -t gfs2 /dev/drbd1 /mnt/fiestacomm

On syslog, you should see something like this::

    Feb 14 10:59:54 fiesta-ha1 dlm_controld[23971]: 70232 dlm_controld 4.0.0 started
    Feb 14 11:06:12 fiesta-ha1 kernel: [70610.395327] GFS2: fsid=fiestaha:commfs: Trying to join cluster "lock_dlm", "fiestaha:commfs"
    Feb 14 11:06:12 fiesta-ha1 kernel: [70610.395484] dlm: Using TCP for communications
    Feb 14 11:06:12 fiesta-ha1 kernel: [70610.399774] dlm: commfs: dlm_recover 1
    Feb 14 11:06:12 fiesta-ha1 kernel: [70610.399810] dlm: commfs: add member 193243328
    Feb 14 11:06:12 fiesta-ha1 kernel: [70610.399812] dlm: commfs: dlm_recover_members 1 nodes
    Feb 14 11:06:12 fiesta-ha1 kernel: [70610.399815] dlm: commfs: generation 1 slots 1 1:193243328
    Feb 14 11:06:12 fiesta-ha1 kernel: [70610.399817] dlm: commfs: dlm_recover_directory
    Feb 14 11:06:12 fiesta-ha1 kernel: [70610.399834] dlm: commfs: dlm_recover_directory 0 entries
    Feb 14 11:06:12 fiesta-ha1 kernel: [70610.399837] dlm: commfs: dlm_callback_resume 0
    Feb 14 11:06:12 fiesta-ha1 kernel: [70610.399853] dlm: commfs: dlm_recover 1 generation 1 done: 0 ms
    Feb 14 11:06:12 fiesta-ha1 kernel: [70610.399877] dlm: commfs: joining the lockspace group...
    Feb 14 11:06:12 fiesta-ha1 kernel: [70610.399878] dlm: commfs: group event done 0 0
    Feb 14 11:06:12 fiesta-ha1 kernel: [70610.399878] dlm: commfs: join complete
    Feb 14 11:06:12 fiesta-ha1 kernel: [70610.900053] GFS2: fsid=fiestaha:commfs: first mounter control generation 0
    Feb 14 11:06:12 fiesta-ha1 kernel: [70610.900056] GFS2: fsid=fiestaha:commfs: Joined cluster. Now mounting FS...
    Feb 14 11:06:12 fiesta-ha1 kernel: [70610.918241] GFS2: fsid=fiestaha:commfs.0: jid=0, already locked for use
    Feb 14 11:06:12 fiesta-ha1 kernel: [70610.918243] GFS2: fsid=fiestaha:commfs.0: jid=0: Looking at journal...
    Feb 14 11:06:12 fiesta-ha1 kernel: [70610.920294] GFS2: fsid=fiestaha:commfs.0: jid=0: Done
    Feb 14 11:06:12 fiesta-ha1 kernel: [70610.920358] GFS2: fsid=fiestaha:commfs.0: jid=1: Trying to acquire journal lock...
    Feb 14 11:06:12 fiesta-ha1 kernel: [70610.920556] GFS2: fsid=fiestaha:commfs.0: jid=1: Looking at journal...
    Feb 14 11:06:12 fiesta-ha1 kernel: [70610.926916] GFS2: fsid=fiestaha:commfs.0: jid=1: Done
    Feb 14 11:06:12 fiesta-ha1 kernel: [70610.926968] GFS2: fsid=fiestaha:commfs.0: first mount done, others may mount


GFS2-utils
----------

Instalar gfs2-utils de git::

    http://git.fedorahosted.org/git/gfs2-utils.git
    ./autogen.sh
    ./configure --prefix=$PREFIX
    make 
    sudo make install


Prepare GFS2 Filesystems:

todo:: explain parameters of mkfs.gfs2

Create filesystem: ``sudo mkfs.gfs2 -p lock_dlm -j 2 -t fiestaha:commfs /dev/drbd1``::

    root@fiesta-ha1:~# mkfs.gfs2 -p lock_dlm -j 2 -t fiestaha:commfs /dev/drbd1
    This will destroy any data on /dev/drbd1.
    It appears to contain: data
    Are you sure you want to proceed? [y/n]y
    Device:                    /dev/drbd1
    Block size:                4096
    Device size:               1.00 GB (262127 blocks)
    Filesystem size:           1.00 GB (262125 blocks)
    Journals:                  2
    Resource groups:           4
    Locking protocol:          "lock_dlm"
    Lock table:                "fiestaha:commfs"
    UUID:                      1074a7a0-3498-4553-09f7-97e4b5d95def

Primitives::

    primitive p_fscomm ocf:heartbeat:Filesystem \
        params device="/dev/drbd/by-res/ha0/0" \
        directory="/opt/fiestacomm" fstype="gfs2"
    clone fscomm_clone p_fscomm
    colocation fscomm_ondrbd inf: fscomm_clone ms_drbd_ha0:Master
    order fscomm_after_dlm_drbd inf: ms_drbd_ha0:promote dlm_clone:start fscomm_clone:start