GitOps with xCAT
In the next version of ClusterFactory, xCAT will be a Kubernetes operator.
This means that the stanza file for the definition of the cluster can be written in YAML, and there will be no need to SSH to xCAT.
Stanza files as source of truth
You can dump the database by using:
lsdef -a -l -z > mydbstanzafile
# -a: all
# -l: long
# -z: stanza
You can apply a stanza by using:
cat mydbstanzafile | mkdef -z
Creating and using a postbootscript/cloud-init file to allow GitOps
A post-boot script/cloud-init file is executed after the boot of the OS, thanks to SystemD.
The strategy is the following:
- The postscripts/cloud-init file will accept one parameter: a private key. This key shouldn't be checked in Git.
- The secret key will decrypt the SSH deploy key, which is stored inside the script.
- With the SSH deploy key, the script will
git clone
the Git repository containing the configuration files - If the script
post.sh
exists in the repository, then we execute this file. - This file will copy files and executes other post-boot scripts.
Step 1: Setup the Git repository
mkdir compute-configs
cd compute-configs
git init
Create a post.sh
file. This is the entry point. You can do anything you want inside (copy or run other scripts).
Here is an example of our script that we use daily for DeepSquare:
#!/bin/sh
# RUN wraps the command to log into journalctl
RUN() {
logger -t postscripts "Running $*..."
"$@"
code=$?
if [ $code -eq 1 ]; then
logger -t postscripts "$* failed with error code $code"
elif [ $code -ne 0 ]; then
logger -t postscripts "$* exited with error code $code"
fi
logger -t postscripts "$* exited with code $code"
}
COPY() {
mkdir -p "$(dirname "$2")"
rsync -av "$1" "$2"
}
SCRIPTPATH=$(dirname "$(realpath "$0")")
# -- SYNCLIST
cd "${SCRIPTPATH}/files" || (echo "cd failed" && exit 1)
COPY ./sssd/sssd.rocky.conf /etc/sssd/sssd.conf
COPY ./munge/munge.key /etc/munge/munge.key
# Slurm configless
COPY ./slurm/slurmd_defaults /etc/default/slurmd
# Slurm
COPY ./systemd/slurmd.service /etc/systemd/system/slurmd.service
COPY ./enroot/00-default.conf /etc/enroot/enroot.conf.d/00-default.conf
COPY ./slurm/prolog.d/ /etc/slurm/prolog.d/
COPY ./slurm/epilog.d/ /etc/slurm/epilog.d/
COPY ./slurm/plugstack.rocky.conf.d/ /etc/slurm/plugstack.conf.d/
# CA
COPY ./certs/csquare.gcloud.pem /etc/pki/ca-trust/source/anchors/csquare.gcloud.pem
update-ca-trust
systemctl restart sssd
# -- APPEND
cat ./slurmctl/keys/id_rsa.pub >>/root/.ssh/authorized_keys
# Restore context
cd "${SCRIPTPATH}" || (echo "cd failed" && exit 1)
# -- EXECUTE (use RUN to log your postscripts)
PATH="${SCRIPTPATH}/postscripts:$PATH"
RUN ldap
RUN fs_mount
RUN slurm
RUN set-motd
The copied files are stored inside a files/
directory and other scripts are stored inside a postscripts/
directory.
Like this:
.
├── files
│ ├── certs
│ │ └── csquare.gcloud.pem
│ ├── enroot
│ │ └── 00-default.conf
│ ├── munge
│ │ └── munge.key
│ ├── slurm
│ │ ├── epilog.d
│ │ │ └── none.sh
│ │ ├── plugstack.rocky.conf.d
│ │ │ └── spank.conf
│ │ ├── prolog.d
│ │ │ └── 50-all-enroot-dirs
│ │ └── slurmd_defaults
│ ├── slurmctl
│ │ └── keys
│ │ ├── id_rsa
│ │ └── id_rsa.pub
│ ├── sssd
│ │ └── sssd.rocky.conf
│ └── systemd
│ └── slurmd.service
├── git-configs-execute.xcat-postbootscript.example
├── postscripts
│ ├── fs_mount
│ ├── ldap
│ ├── set-motd
│ └── slurm
└── post.sh
Commit and put it on GitHub as a private (or public if you feel safe) repository:
git add .
git commit -m "feat: initial commit"
git remote add origin https://github.com/user/repo.git
git branch -M main
git push -u origin main
Step 2: Add a SSH deploy key to the GitHub repository
ssh-keygen -t ed25519 -f key
Put the key.pub
on GitHub as a deploy key:
Step 3: Encrypt the SSH deploy private key
openssl enc -aes-256-cbc -a -salt -pbkdf2 -in key -out key.enc
Save the password for the next step.
Step 4: Creating the post-boot script/cloud-init file
- xCAT
- cloud-init
#!/bin/sh
# Params:
# 1: password for the ssh key
set -x
mkdir -p /configs
# Encrypt
cat << EOF > /key.enc
<FILL ME: encrypted SSH deploy key>
EOF
chmod 600 /key.enc
echo "$1" | openssl aes-256-cbc -d -a -pbkdf2 -in /key.enc -out /key -pass stdin
chmod 600 /key
GIT_SSH_COMMAND='ssh -i /key -o IdentitiesOnly=yes' git clone <FILL ME: your git repository> /configs
if [ -f /configs/post.sh ] && [ -x /configs/post.sh ]; then
cd /configs || exit 1
./post.sh
fi
rm -f /key /key.env
# Security
chmod -R g-rwx,o-rwx .
On xCAT, you should add the post-boot script inside an osimage
stanza:
rocky8.6-x86_64-netboot-compute:
objtype=osimage
exlist=/xcatdata/install/rocky8.6/x86_64/Packages/compute.rocky8.x86_64.exlist
imagetype=linux
kernelver=4.18.0-305.17.1.el8_4.x86_64
osarch=x86_64
osname=Linux
osvers=rocky8.6
permission=755
postbootscripts=git-config-execute.sh <FILL ME: password>
profile=compute
provmethod=netboot
pkgdir=/tmp
pkglist=/dev/null
rootimgdir=/install/netboot/rocky8.6/x86_64/compute
Since we are doing GitOps, we do not need to use the xCAT provisioning system. Therefore, we set pkgdir=/tmp
and pkglist=/dev/null
.
Since the stanza contains a secret, you should store it in a Secret management system like HashiCorp Vault or a Sealed Secrets.
#cloud-config
write_files:
- content: |
<FILL ME: encrypted SSH deploy key>
path: /key.enc
permissions: '0600'
runcmd:
- [
sh,
-c,
"echo '<FILL ME: password>' | openssl aes-256-cbc -d -a -pbkdf2 -in /key.enc -out /key -pass stdin",
]
- [chmod, '600', /key]
- [
sh,
-c,
"mkdir -p /configs && GIT_SSH_COMMAND='ssh -i /key -o IdentitiesOnly=yes' git clone <FILL ME: your git repository> /configs",
]
- [
sh,
-c,
'if [ -f /configs/post.sh ] && [ -x /configs/post.sh ]; then cd /configs && ./post.sh compute; fi',
]
- [rm, -f, /key, /key.enc]
- [chmod, -R, 'g-rwx,o-rwx', '.']
Since the cloud-init
contains a secret, you should store it in a Secret management system like HashiCorp Vault or a Sealed Secrets.