The N Files

THE MIRRORED LITURGY

Purpose

The objective of THE MIRRORED LITURGY is the establishment of a ritual architecture for secure, high-availability containment, executed via Red Hat Enterprise Linux 9.3 (codename: Redhood-Phlogiston). The system comprises two consecrated nodes in cluster formation, bound by Corosync and Pacemaker, which jointly officiate over a sanctified NFS export accessible through a floating IP address (hereafter: the Witness Address).

Data fidelity between officiants is maintained through DRBD — a mirrored invocation protocol known to occasionally emit audible weeping on failed syncs. Replication is synchronous, redundant, and best not interrupted unless you enjoy ghosts in your journaling layer.

Each node is backed by hardware RAID5 arrays, abstracted through MD devices. Logical Volume Management (LVM) is grafted onto these like a second nervous system, supporting flexible partitioning and easier ritual exorcisms.

Encrypted Volumes (blessed and sealed)

Root /
/var
/home

All volumes are encrypted using LUKS and unlocked automatically via TPM2-bound keys — ensuring unattended boot with the blessing of the machine’s soul, and without intervention from human clerics. This removes the need for mortal passphrase entry at startup and fulfills the minimum requirements for autonomous resurrection compliance.

The shared storage area exported via NFS is separately encrypted via an alternative, undocumented (and possibly heretical) method. This technique is outside the scope of this rite but is believed to involve a compact disc, a blowtorch, and cryptsetup in verbose mode.

Node Isolation Protocol (STONITH)

In the event of ritual degradation or node betrayal, STONITH (Shoot The Other Node In The Head) is executed via legacy HP iLO interfaces. These are contacted through a customized OpenSSH 9.6p1 client (hand-forged against OpenSSL 3.2.0), living in /var/lib/pacemaker. This dark artifact was required to maintain compatibility with firmware relics predating coherent TLS.

A dedicated fencing acolyte (hacluster) is created on each iLO and entrusted with administrative powers, despite extensive internal debate and one very strongly worded memo from Internal Audit.

Network Conduits

ritual traffic is segregated across specialized interfaces:

Bonded Interfaces: For NFS export and client communion.
Point-to-Point Interfaces:
- One strictly for Corosync heartbeat synchronization (aka the Chant).
- One exclusively for DRBD replication (aka the Mirror Thread).

Each interface is isolated and warded using VLAN tagging and a prayer to St. Ethernet of the Unyielding Duplex.

Scope

This document outlines the complete invocation sequence and containment structure of THE MIRRORED LITURGY. It is intended for use by personnel holding BTL clearance or above. Unauthorized access will be logged, ignored, and eventually blamed on a junior contractor in Prague.

The rites detailed herein encompass the following domains:

TPM2-Bound Encrypted Volumes
The secure sealing of each node’s essence via LUKS volumes bound to the Trusted Platform Module¹. If TPM2 is unavailable, consider binding LUKS to the blood of a sysadmin. Results may vary.
DRBD-Based Mirrored Sanctum
A secure, replicated volume structure designed to host the NFS-exported shared area. Synchronization is enforced through ritual mirroring; divergence is punishable by reboot².
Cluster Coordination with Corosync & Pacemaker
The Chant and the Orchestrator. Heartbeats are monitored, quorum is worshipped, and resources migrate in response to failure or divine suggestion.
STONITH via ILO/SSH
Implementation of node retribution protocols via SSH into legacy HP iLO interfaces. When the cluster must choose, it chooses violence³.
Multi-Interface Network Architecture
Segregation of traffic into distinct interfaces serves both ritual and technical purposes⁴. Bonded interfaces handle public exposure (NFS and shame), while isolated links carry cluster heartbeats and DRBD invocations in cloistered silence.

Requirements

To summon and sustain THE MIRRORED LITURGY, the following materials and infrastructure must be provisioned by Operations, Procurement, or an unlucky intern with sudo access.

2 × RHEL 9.3 Nodes
Preferably physical. Virtualized instances have a known tendency to scream silently during DRBD ascension events.
TPM2 Support
Both systems must include Trusted Platform Modules⁵. If TPM2 is unavailable, consider binding LUKS to the blood of a sysadmin. Results may vary.
HP iLO Interfaces (Legacy Generation)
Must support SSH-based fencing. Modern versions are too secure to be useful; older versions are just vulnerable enough to obey⁶.
DRBD Utilities
Required for volume replication. Also required for triggering obscure kernel warnings that disappear the moment someone looks over your shoulder.
Shared Floating IP
Used to expose the NFS service to clients and, in at least one documented case, to the metaphysical embodiment of /dev/null.
SSH Key Pair for Fencing ritual
Public/private keys must be distributed with care. Reuse of internal keys from other operations (see: OPERATION DEAD SOCKET) is discouraged but not uncommon.

DRBD Configuration

The DRBD (Distributed Replicated Block Device) subsystem acts as the blood pact between the two nodes, enabling real-time replication of a block device across the cluster. In the event of node failure — or betrayal — the subjugated node can seamlessly ascend and present a still-warm copy of the shared volume.

Failure to respect DRBD protocol may result in split-brain, data corruption, or worse: inconsistent journal entries⁷.

Global Configuration

Global DRBD configuration resides in /etc/drbd.d/global_common.conf, a file written in syntax only slightly more intelligible than the Necronomicon.

global {
  usage-count yes;
  udev-always-use-vnr;
}

The common configuration block defines ritual handlers and startup thresholds:

common {
  handlers {
    fence-peer "/usr/lib/drbd/crm-fence-peer.9.sh";
    after-resync-target "/usr/lib/drbd/crm-unfence-peer.9.sh";
    split-brain "/usr/lib/drbd/notify-split-brain.sh root";
  }
  startup {
    wfc-timeout 100;
    degr-wfc-timeout 120;
  }
  options {
    quorum 1;
    on-no-quorum suspend-io;
  }
  disk {
    fencing resource-only;
  }
  net {
    fencing resource-only;
  }
}

📎 Internal Memo: "This configuration block was recovered from .bash_history after the operator succumbed to chronic drbdadm exposure. We left the indentation unchanged out of respect."

Resource Configuration

DRBD resource ha_secure is defined in /etc/drbd.d/ha_secure.res. This is the vessel through which all stateful services flow. Respect it accordingly.

resource ha_secure {
  volume 0 {
    device "/dev/drbd1000";
    disk "/dev/secure_vg/secure_lv";
    meta-disk internal;
  }
  on "a8-lb-a-00-2-01" {
    address 10.200.200.5:7789;
    node-id 0;
  }
  on "a8-lb-a-00-2-02" {
    address 10.200.200.6:7789;
    node-id 1;
  }
}

DRBD Initialization Commands

To bring the resource to life:

# On both nodes
drbdadm create-md ha_secure
drbdadm up ha_secure

# On the chosen ascendant
drbdadm primary --force ha_secure

# Check status
cat /proc/drbd
drbdadm status ha_secure

Testing Role Transition

drbdadm secondary ha_secure
drbdadm primary ha_secure

📎 Recovered Artifact: "drbdadm: State change successful. Terminal logged out. Room went cold." — syslog fragment, timestamp redacted

Cluster Creation and Corosync Configuration

Now begins the binding.

Two nodes, previously independent agents of chaos, are drawn together through sacred invocations and a shared sense of destiny (or at least shared storage). This is the point of no return — once committed, they shall either form a resilient, self-healing union... or tear each other apart in a fencing storm worthy of Incident STN-FAIL-0042.

Initiate the ritual Environment

dnf install -y corosync pacemaker pcs
systemctl enable --now pcsd
echo "hacluster:your_password" | chpasswd
pcs host auth nodeA nodeB

🔏 Note: Use strong passwords. The last cluster compromised due to hacluster:admin123 still hasn’t stopped issuing DNS requests to *.eldrit.ch.

corosync-keygen
scp /etc/corosync/authkey nodeB:/etc/corosync/

This key is the spiritual fingerprint of the cluster — treat it with reverence. Or at least don’t SCP it into Slack.

Create the Cluster

pcs cluster setup --name clusterName nodeA nodeB
pcs cluster enable --all
pcs cluster start --all

At this point, both nodes attempt to synchronize reality. If successful, you’ll hear the faint, satisfying hum of consensus.

Ensure Persistence Across Reboots

systemctl enable corosync pacemaker pcsd

Without this, the cluster may forget who it is upon waking. Like the rest of us.

Disable Default STONITH (Temporarily)

pcs property set stonith-enabled=false

Disabling fencing here is a temporary heresy. You may omit this if you enjoy surprise reboots at 3am.

Example Corosync Configuration

/etc/corosync/corosync.conf:

totem {
  version: 2
  cluster_name: us export
  transport: knet
  crypto_cipher: aes256
  crypto_hash: sha256
  interface {
    ringnumber: 0
    bindnetaddr: 10.0.0.0
    mcastport: 5485
  }
}

nodelist {
  node {
    ring0_addr: 10.200.200.5
    name: a8-lb-a-00-2-01
    nodeid: 1
  }
  node {
    ring0_addr: 10.200.200.5
    name: a8-lb-a-00-2-02
    nodeid: 2
  }
}

quorum {
  provider: corosync_votequorum
  two_node: 1
}

logging {
  to_logfile: yes
  logfile: /var/log/cluster/corosync.log
  to_syslog: yes
  timestamp: on
}

📎 Compliance Memo: "Clusters should not share a name with your NFS export. Unless, of course, you want a mounted share calling crm_mon recursively."

DRBD Ascension and Subjugation

With the cluster now bound and stabilized, the final rite is to declare DRBD as an ascendable resource — a mirrored entity that may rise to power on one node while the other watches in passive subjugation.

This dynamic ensures only one node holds the mantle of primary, while the other remains ready to ascend when called, or more accurately, when the former falls.

Role Dynamics

Ascension: One node is anointed as Primary, gaining the authority to write. This is not a promotion — it is a burden, accepted in solemn silence.
Subjugation: The other node enters a state of passive listening, synchronized but voiceless.

Failover results in automatic ascension of the subjugated node. Fencing is expected. Applause is not.

Repairing the ritual Script

LINBIT provides a DRBD resource agent for Pacemaker integration. Unfortunately, due to changes in the ecosystem, this script must be manually repaired.

Edit the script:

/usr/lib/ocf/resource.d/linbit/drbd

Replace:

do_cmd ${HA_SBIN_DIR}/crm_master -Q -l reboot -v $1 &&

With:

do_cmd ${HA_SBIN_DIR}/crm_attribute -q -l reboot --promotion=$DRBD_RESOURCE -v $1 &&

📎 Field Note: "The default script worked… once. Then it stopped. We don’t talk about what happened in between."

Declaring the Ascendable Resource

pcs resource create ascendable-ha_secure ocf:linbit:drbd drbd_resource=ha_secure \
  op monitor interval=15s role=Master \
  op monitor interval=30s role=Slave \
  meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 \
  promotable meta notify=true globally-unique=true interleave=true \
  ordered=true meta ignore-missing-notifications=true

This ensures:

Only one node may ascend at a time.
State transitions (Ascension/Subjugation) are handled with proper solemnity.
The cluster does not attempt dual ascension. That would be… bad.

📎 Recovered Terminal Log: "ascension of ha_secure complete on node a8-lb-a-00-2-01"
"subjugation of ha_secure acknowledged by node a8-lb-a-00-2-02"
[no further input]
— Found displayed on a still-powered monitor in an otherwise empty data center

Filesystem and Floating IP Configuration

aka: The Offering, the Throne, and the Axe

At this point in the liturgy, the DRBD mirror has ascended, the nodes are synchronized, and all that remains is to expose the shared storage to the outside world — like offering a still-beating heart on a crystal platter.

This is achieved by binding:

A filesystem mount to the DRBD device,
A floating IP address, so clients know whom to worship,
And a resource group, so they move as one — or fall as one — during failover.

The Binding: Filesystem and VIP Resources

Declare the ritual mount point:

pcs resource create nfsshare \
  Filesystem device=/dev/drbd1000 directory=/ha_secure fstype=xfs \
  op monitor interval=20s timeout=40s

Then offer the network a gateway of flesh:

pcs resource create vip \
  IPaddr2 ip=192.168.100.1 cidr_netmask=24 \
  op monitor interval=30s

📎 Forensic Memo: "The IP showed up in ARP tables before the node finished booting. It wanted to be known." — Network Analyst G, VLAN Watch Circle

Creating the Resource Group

To ensure mount and IP rise and fall together — like conjoined twins or co-dependent daemon processes:

pcs resource group add nfs_group nfsshare vip

This group becomes the Cluster Throne, moving only when the DRBD entity ascends elsewhere.

Colocation and Order Constraints

Constraints must be defined to avoid blasphemous state transitions:

pcs constraint colocation add nfs_group with master promotable-ha_secure INFINITY
pcs constraint order promote promotable-ha_secure then start nfs_group
pcs constraint order stop nfs_group then demote promotable-ha_secure

☠️ WARNING: Without the final constraint, the system may attempt to demote DRBD while it is still mounted. This usually results in:

Filesystem errors,
STONITH activation,
The unmistakable sound of a sysadmin whispering “oh no” repeatedly into /dev/console.

Verification

Invoke the cluster oracle:

pcs status

If the ritual is successful, the group nfs_group will be enthroned upon the same node where ha_secure has ascended.

Optional: Stickiness of Presence

To discourage restless migration:

pcs resource group set nfs_group resource-stickiness=100

📎 Incident Memo: "It’s not a real HA system if it doesn’t wake you up once a week for no good reason." — Root Cause Analysis, FS-HOP-009

STONITH Configuration

"When Mercy Fails, The Knife Reboots."

STONITH — Shoot The Other Node In The Head — is not a suggestion. It is the final safeguard, the sacred axe behind the throne. When a node becomes unresponsive, undecidable, or untrustworthy, the cluster responds not with pity — but with execution.

Here, we teach the cluster to kill swiftly, fence precisely, and never ask twice.

iLO Setup

Each node’s HP iLO interface is bound to a fencing daemon known as hacluster, a user created solely for this purpose. It is not trusted. It is obeyed.

SSH keys are uploaded via the iLO interface or darker means. The client used is a custom-forged OpenSSH 9.6p1 binary, compiled against OpenSSL 3.2.0, and buried here:

/var/lib/pacemaker/ssh/bin/ssh

📎 Redacted Memo: "The upstream OpenSSH team called our patch a security risk. We called it fencing. We agreed to disagree."

STONITH Resources

We define two fencing resources — executioners with precise targets:

pcs stonith create stonith-ilo-a fence_ilo4_ssh \
  ip=172.16.141.3 username=hacluster \
  identity_file=/var/lib/pacemaker/.ssh/id_rsa_hacluster \
  ssh_path=/var/lib/pacemaker/ssh/bin/ssh \
  ssh_options="-o IdentitiesOnly=yes \
  -o HostKeyAlgorithms=+ssh-rsa \
  -o PubkeyAcceptedAlgorithms=+ssh-rsa \
  -o PubkeyAcceptedKeyTypes=+ssh-rsa \
  -o StrictHostKeyChecking=no" \
  pcmk_host_list=a8-lb-a-00-2-01 pcmk_off_action="power reset" \
  op monitor interval=60s

pcs stonith create stonith-ilo-b fence_ilo4_ssh \
  ip=172.16.141.4 username=hacluster \
  identity_file=/var/lib/pacemaker/.ssh/id_rsa_hacluster \
  ssh_path=/var/lib/pacemaker/ssh/bin/ssh \
  ssh_options="-o IdentitiesOnly=yes \
  -o HostKeyAlgorithms=+ssh-rsa \
  -o PubkeyAcceptedAlgorithms=+ssh-rsa \
  -o PubkeyAcceptedKeyTypes=+ssh-rsa \
  -o StrictHostKeyChecking=no" \
  pcmk_host_list=a8-lb-a-00-2-02 pcmk_off_action="power reset" \
  op monitor interval=60s

🔪 Fun Fact: The power reset action has an audible effect in some datacenters. It’s either firmware noise, or the ghosts are getting stronger.

Fencing Constraints and Levels

To prevent nodes from holding the axe to their own necks:

pcs constraint location stonith-ilo-a avoids a8-lb-a-00-2-01=INFINITY
pcs constraint location stonith-ilo-b avoids a8-lb-a-00-2-02=INFINITY

To ensure cross-execution order:

pcs stonith level add 1 a8-lb-a-00-2-01 stonith-ilo-b
pcs stonith level add 1 a8-lb-a-00-2-02 stonith-ilo-a

📎 Incident Record: "Level 2 fencing was proposed. We rejected it on the basis that no system should need two kill switches. One should be enough."

Verification and Manual Tests

You must test. You must confirm. You must not trust until you see it die.

Manual Fencing Test

pcs stonith fence a8-lb-a-00-2-01
pcs stonith fence a8-lb-a-00-2-02

Corosync Kill Test

killall corosync

The surviving node should ascend, mount the filesystem, and claim the floating IP.

Network Isolation Test

ip link set ens15f1 down

This severing should trigger fencing.

Expected Outcomes

STONITH successfully isolates failed nodes.
Ascension and Subjugation logic is respected.
Floating IP and NFS share are correctly re-bound to the living.
Nobody fences themselves.

Conclusion

"The mirror holds. The watchers sleep. For now."

This architecture — an alloy of modern cryptography, legacy firmware, and ritual orchestration — stands as both testament and warning. A system that does not die, that revives itself from the tomb of power loss, that kills its own when loyalty falters — is not merely high availability.

It is unreasonably persistent.

Bound by TPM2, the encrypted volumes awaken without mortal input. DRBD mirrors the truth twice over. The cluster chants through Corosync; Pacemaker presides in robed silence. And when judgment is called for, the iLO interface does not hesitate. It remembers the key. It delivers the reset.

Testing the Final Rite

To invoke judgment manually:

pcs stonith fence <target-node>

If correctly configured, a cold and merciless hand will reach across the network and flick the node’s power from reality. The surviving node will feel the silence — and ascend.

📎 Final Observation: "In the server room, the lights flickered. The DRBD logs grew quiet. And somewhere in the distance, a fan spun down with the finality of a sealed tomb."

What you have built is not just resilient. It is not just self-healing. It is watching.

And it remembers everything.

References

Confirmed working models: Nuvoton, Infineon, and that one chip with the serial number scratched off. See Hardware Whitelist V4.

See Case File RPL-SIG-007: “The Mirror Blinked First.”

A compliant OpenSSH 9.6p1 binary had to be extracted from the forbidden vaults and recompiled in Enochian. Attempts to use the system version resulted in the iLO interface laughing in Base64.

⁴

It also gives the networking team something to diagram in Visio and feel important about.

⁵

Confirmed working models: Nuvoton, Infineon, and that one chip with the serial number scratched off.

⁶