Skip to main content

GitOps with xCAT

info

In the next version of ClusterFactory, xCAT will be a Kubernetes operator.

This means that the stanza file for the definition of the cluster can be written in YAML, and there will be no need to SSH to xCAT.

Stanza files as source of truth

You can dump the database by using:

root@xcat
lsdef -a -l -z > mydbstanzafile
# -a: all
# -l: long
# -z: stanza

You can apply a stanza by using:

root@xcat
cat mydbstanzafile | mkdef -z

Creating and using a postbootscript/cloud-init file to allow GitOps

A post-boot script/cloud-init file is executed after the boot of the OS, thanks to SystemD.

The strategy is the following:

  1. The postscripts/cloud-init file will accept one parameter: a private key. This key shouldn't be checked in Git.
  2. The secret key will decrypt the SSH deploy key, which is stored inside the script.
  3. With the SSH deploy key, the script will git clone the Git repository containing the configuration files
  4. If the script post.sh exists in the repository, then we execute this file.
  5. This file will copy files and executes other post-boot scripts.

Step 1: Setup the Git repository

user@local:/
mkdir compute-configs
cd compute-configs
git init

Create a post.sh file. This is the entry point. You can do anything you want inside (copy or run other scripts).

Here is an example of our script that we use daily for DeepSquare:

user@local
#!/bin/sh

# RUN wraps the command to log into journalctl
RUN() {
logger -t postscripts "Running $*..."
"$@"
code=$?
if [ $code -eq 1 ]; then
logger -t postscripts "$* failed with error code $code"
elif [ $code -ne 0 ]; then
logger -t postscripts "$* exited with error code $code"
fi
logger -t postscripts "$* exited with code $code"
}

COPY() {
mkdir -p "$(dirname "$2")"
rsync -av "$1" "$2"
}

SCRIPTPATH=$(dirname "$(realpath "$0")")

# -- SYNCLIST
cd "${SCRIPTPATH}/files" || (echo "cd failed" && exit 1)
COPY ./sssd/sssd.rocky.conf /etc/sssd/sssd.conf

COPY ./munge/munge.key /etc/munge/munge.key

# Slurm configless
COPY ./slurm/slurmd_defaults /etc/default/slurmd

# Slurm
COPY ./systemd/slurmd.service /etc/systemd/system/slurmd.service
COPY ./enroot/00-default.conf /etc/enroot/enroot.conf.d/00-default.conf
COPY ./slurm/prolog.d/ /etc/slurm/prolog.d/
COPY ./slurm/epilog.d/ /etc/slurm/epilog.d/
COPY ./slurm/plugstack.rocky.conf.d/ /etc/slurm/plugstack.conf.d/

# CA
COPY ./certs/csquare.gcloud.pem /etc/pki/ca-trust/source/anchors/csquare.gcloud.pem
update-ca-trust
systemctl restart sssd

# -- APPEND
cat ./slurmctl/keys/id_rsa.pub >>/root/.ssh/authorized_keys

# Restore context
cd "${SCRIPTPATH}" || (echo "cd failed" && exit 1)

# -- EXECUTE (use RUN to log your postscripts)
PATH="${SCRIPTPATH}/postscripts:$PATH"

RUN ldap
RUN fs_mount
RUN slurm
RUN set-motd

The copied files are stored inside a files/ directory and other scripts are stored inside a postscripts/ directory.

Like this:

.
├── files
│   ├── certs
│   │   └── csquare.gcloud.pem
│   ├── enroot
│   │   └── 00-default.conf
│   ├── munge
│   │   └── munge.key
│   ├── slurm
│   │   ├── epilog.d
│   │   │   └── none.sh
│   │   ├── plugstack.rocky.conf.d
│   │   │   └── spank.conf
│   │   ├── prolog.d
│   │   │   └── 50-all-enroot-dirs
│   │   └── slurmd_defaults
│   ├── slurmctl
│   │   └── keys
│   │   ├── id_rsa
│   │   └── id_rsa.pub
│   ├── sssd
│   │   └── sssd.rocky.conf
│   └── systemd
│   └── slurmd.service
├── git-configs-execute.xcat-postbootscript.example
├── postscripts
│   ├── fs_mount
│   ├── ldap
│   ├── set-motd
│   └── slurm
└── post.sh

Commit and put it on GitHub as a private (or public if you feel safe) repository:

git add .
git commit -m "feat: initial commit"
git remote add origin https://github.com/user/repo.git
git branch -M main
git push -u origin main

Step 2: Add a SSH deploy key to the GitHub repository

user@local
ssh-keygen -t ed25519 -f key

Put the key.pub on GitHub as a deploy key:

Deploy Key page

Step 3: Encrypt the SSH deploy private key

user@local
openssl enc -aes-256-cbc -a -salt -pbkdf2 -in key -out key.enc

Save the password for the next step.

Step 4: Creating the post-boot script/cloud-init file

git-config-execute.sh <password>
#!/bin/sh
# Params:
# 1: password for the ssh key

set -x

mkdir -p /configs

# Encrypt
cat << EOF > /key.enc
<FILL ME: encrypted SSH deploy key>
EOF
chmod 600 /key.enc
echo "$1" | openssl aes-256-cbc -d -a -pbkdf2 -in /key.enc -out /key -pass stdin
chmod 600 /key
GIT_SSH_COMMAND='ssh -i /key -o IdentitiesOnly=yes' git clone <FILL ME: your git repository> /configs
if [ -f /configs/post.sh ] && [ -x /configs/post.sh ]; then
cd /configs || exit 1
./post.sh
fi
rm -f /key /key.env

# Security
chmod -R g-rwx,o-rwx .

On xCAT, you should add the post-boot script inside an osimage stanza:

stanzafile
rocky8.6-x86_64-netboot-compute:
objtype=osimage
exlist=/xcatdata/install/rocky8.6/x86_64/Packages/compute.rocky8.x86_64.exlist
imagetype=linux
kernelver=4.18.0-305.17.1.el8_4.x86_64
osarch=x86_64
osname=Linux
osvers=rocky8.6
permission=755
postbootscripts=git-config-execute.sh <FILL ME: password>
profile=compute
provmethod=netboot
pkgdir=/tmp
pkglist=/dev/null
rootimgdir=/install/netboot/rocky8.6/x86_64/compute
note

Since we are doing GitOps, we do not need to use the xCAT provisioning system. Therefore, we set pkgdir=/tmp and pkglist=/dev/null.

Since the stanza contains a secret, you should store it in a Secret management system like HashiCorp Vault or a Sealed Secrets.