Skip to main content

Deploying SLURM Cluster

image-20220512151655613

caution

Deploying the SLURM database isn't stable yet. Please feel free to create an issue so we can improve its stability.

Helm and Docker resources

The Helm resources are stored on the ClusterFactory Git Repository.

The Dockerfile is described in the git repository deepsquare-io/slurm-docker.

The Docker images can be pulled with:

docker pull ghcr.io/deepsquare-io/slurm:latest-controller
docker pull ghcr.io/deepsquare-io/slurm:latest-login
docker pull ghcr.io/deepsquare-io/slurm:latest-db
docker pull ghcr.io/deepsquare-io/slurm:latest-rest
note

You should always verify the default Helm values before editing the values field of an Argo CD Application.

1. Preparation

Compared to the other guides we will start from scratch.

Delete the argo/slurm-cluster directory (or rename it).

Deploying a SLURM cluster isn't easy and you MUST have these components ready:

  • A LDAP server and a SSSD configuration, to synchronize the user ID across the cluster
  • A MySQL server for the SLURM DB
  • A JWT private key, for the authentication via REST API
  • A MUNGE key, for the authentication of SLURM daemons

Namespace and AppProject

Create and apply the Namespace and AppProject:

argo/slurm-cluster/namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
name: slurm-cluster
labels:
app.kubernetes.io/name: slurm-cluster
argo/slurm-cluster/app-project.yaml
apiVersion: argoproj.io/v1alpha1
kind: AppProject
metadata:
name: slurm-cluster
namespace: argocd
# Finalizer that ensures that project is not deleted until it is not referenced by any application
finalizers:
- resources-finalizer.argocd.argoproj.io
spec:
description: Slurm cluster
# Allow manifests to deploy from any Git repos
sourceRepos:
- '*'
# Only permit applications to deploy to the namespace in the same cluster
destinations:
- namespace: slurm-cluster
server: https://kubernetes.default.svc

namespaceResourceWhitelist:
- kind: '*'
group: '*'

clusterResourceWhitelist:
- kind: '*'
group: '*'
user@local:/ClusterFactory
kubectl apply -f argo/slurm-cluster/

LDAP deployment

  1. Follow the guide.

  2. Open a shell on the LDAP server container and create a user and a group slurm:

Pod: dirsrv-389ds-0 (namespace: ldap)
# Create user
dsidm -b "dc=example,dc=com" localhost user create \
--uid slurm \
--cn slurm \
--displayName slurm \
--homeDirectory "/dev/shm" \
--uidNumber 1501 \
--gidNumber 1501

# Create group
dsidm -b "dc=example,dc=com" localhost group create \
--cn slurm

# Add posixGroup property and gidNumber
dsidm -b "dc=example,dc=com" localhost group modify slurm \
"add:objectClass:posixGroup" \
"add:gidNumber:1501"

SSSD configuration

Let's store it in a Secret:

  1. Create the argo/slurm-cluster/secrets/ directory and create a -secret.yaml.local file:
argo/slurm-cluster/secrets/sssd-secret.yaml.local
apiVersion: v1
kind: Secret
metadata:
name: sssd-secret
namespace: slurm-cluster
type: Opaque
stringData:
sssd.conf: |
# https://sssd.io/docs/users/troubleshooting/how-to-troubleshoot-backend.html
[sssd]
services = nss,pam,sudo,ssh
config_file_version = 2
domains = example-ldap

[sudo]

[nss]

[pam]
offline_credentials_expiration = 60

[domain/example-ldap]
debug_level=3
cache_credentials = True
dns_resolver_timeout = 15

override_homedir = /home/ldap-users/%u

id_provider = ldap
auth_provider = ldap
access_provider = ldap
chpass_provider = ldap

ldap_schema = rfc2307bis

ldap_uri = ldaps://dirsrv-389ds.ldap.svc.cluster.local:3636
ldap_default_bind_dn = cn=Directory Manager
ldap_default_authtok = <password>
ldap_search_timeout = 50
ldap_network_timeout = 60
ldap_user_member_of = memberof
ldap_user_gecos = cn
ldap_user_uuid = nsUniqueId
ldap_group_uuid = nsUniqueId

ldap_search_base = ou=people,dc=example,dc=com
ldap_group_search_base = ou=groups,dc=example,dc=com
ldap_sudo_search_base = ou=sudoers,dc=example,dc=com
ldap_user_ssh_public_key = nsSshPublicKey

ldap_account_expire_policy = rhds
ldap_access_order = filter, expire
ldap_access_filter = (objectClass=posixAccount)

ldap_tls_cipher_suite = HIGH
# On Ubuntu, the LDAP client is linked to GnuTLS instead of OpenSSL => cipher suite names are different
# In fact, it's not even a cipher suite name that goes here, but a so called "priority list" => see $> gnutls-cli --priority-list
# See https://backreference.org/2009/11/18/openssl-vs-gnutls-cipher-names/ , gnutls-cli is part of package gnutls-bin

Adapt this secret based on your LDAP configuration.

  1. Seal the secret:
user@local:/ClusterFactory
cfctl kubeseal
  1. Apply the SealedSecret:
user@local:/ClusterFactory
kubectl apply -f argo/slurm-cluster/secrets/sssd-sealed-secret.yaml

MySQL deployment

You can deploy MySQL using the Helm Chart of Bitnami and deploy it with ArgoCD.

After deploying the MySQL/MariaDB server, you must create a slurm database. Open a shell on the MySQL container and run:

Pod: mariadb-0
mysql -u root -p -h localhost
# Enter your root password
Pod: mariadb-0 (sql)
create user 'slurm'@'%' identified by '<your password>';
grant all on slurm_acct_db.* TO 'slurm'@'%';
create database slurm_acct_db;

JWT Key generation

user@local
ssh-keygen -t rsa -b 4096 -m PEM -f jwtRS256.key

Let's store it in a Secret:

  1. Create a -secret.yaml.local file:
argo/slurm-cluster/secrets/slurm-secret.yaml.local
apiVersion: v1
kind: Secret
metadata:
name: slurm-secret
namespace: slurm-cluster
type: Opaque
stringData:
jwt_hs256.key: |
-----BEGIN RSA PRIVATE KEY-----
...
-----END RSA PRIVATE KEY-----
  1. Seal the secret:
user@local:/ClusterFactory
cfctl kubeseal
  1. Apply the SealedSecret:
user@local:/ClusterFactory
kubectl apply -f argo/slurm-cluster/secrets/slurm-sealed-secret.yaml

MUNGE Key generation

root@local
# As root
dnf install -y munge
/usr/sbin/create-munge-key
cat /etc/munge/munge.key | base64

Let's store it in a Secret:

  1. Create a -secret.yaml.local file:
argo/slurm-cluster/secrets/munge-secret.yaml.local
apiVersion: v1
kind: Secret
metadata:
name: munge-secret
namespace: slurm-cluster
type: Opaque
data:
munge.key: |
<base 64 encoded key>
  1. Seal the secret:
user@local:/ClusterFactory
cfctl kubeseal
  1. Apply the SealedSecret:
user@local:/ClusterFactory
kubectl apply -f argo/cvmfs/secrets/munge-sealed-secret.yaml

2. Begin writing the slurm-cluster-<cluster name>-app.yaml

2.a. Argo CD Application configuration

argo/slurm-cluster/apps/slurm-cluster-<cluster name>-app.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: slurm-cluster-<FILL ME: cluster name>-app
namespace: argocd
finalizers:
- resources-finalizer.argocd.argoproj.io
spec:
project: slurm-cluster
source:
# You should have forked this repo. Change the URL to your fork.
repoURL: git@github.com:<FILL ME: your account>/ClusterFactory.git
# You should use your branch too.
targetRevision: HEAD
path: helm/slurm-cluster
helm:
releaseName: slurm-cluster-<FILL ME: cluster name>

valueFiles:
- values-<FILL ME: cluster name>.yaml

destination:
server: 'https://kubernetes.default.svc'
namespace: slurm-cluster

syncPolicy:
automated:
prune: true # Specifies if resources should be pruned during auto-syncing ( false by default ).
selfHeal: true # Specifies if partial app sync should be executed when resources are changed only in target Kubernetes cluster and no git change detected ( false by default ).
allowEmpty: false # Allows deleting all application resources during automatic syncing ( false by default ).
syncOptions: []
retry:
limit: 5 # number of failed sync attempt retries; unlimited number of attempts if less than 0
backoff:
duration: 5s # the amount to back off. Default unit is seconds, but could also be a duration (e.g. "2m", "1h")
factor: 2 # a factor to multiply the base duration after each failed retry
maxDuration: 3m # the maximum amount of time allowed for the backoff strategy

2.b. Values: Configuring the SLURM cluster

Add:

helm/slurm-cluster/values-<cluster name>.yaml
sssd:
secretName: sssd-secret

munge:
secretName: munge-secret

jwt:
secretName: slurm-secret

slurmConfig:
clusterName: <FILL ME: cluster-name>

compute:
srunPortRangeStart: 60001
srunPortRangeEnd: 63000
debug: debug5

accounting: |
AccountingStorageType=accounting_storage/slurmdbd
AccountingStorageHost=slurm-cluster-<FILL ME: cluster name>.slurm-cluster.svc.cluster.local
AccountingStoragePort=6819
AccountingStorageTRES=gres/gpu

controller:
parameters: enable_configless
debug: debug5

defaultResourcesAllocation: |
# Change accordingly
DefCpuPerGPU=4
DefMemPerCpu=7000

nodes: |
# Change accordingly
NodeName=cn[1-12] CPUs=32 Boards=1 SocketsPerBoard=1 CoresPerSocket=16 ThreadsPerCore=2 RealMemory=128473 Gres=gpu:4

partitions: |
# Change accordingly
PartitionName=main Nodes=cn[1-12] Default=YES MaxTime=INFINITE State=UP OverSubscribe=NO TRESBillingWeights="CPU=2.6,Mem=0.25G,GRES/gpu=24.0"

gres: |
# Change accordingly
NodeName=cn[1-12] File=/dev/nvidia[0-3] AutoDetect=nvml

# Extra slurm.conf configuration
extra: |
LaunchParameters=enable_nss_slurm
DebugFlags=Script,Gang,SelectType
TCPTimeout=5

# MPI stacks running over Infiniband or OmniPath require the ability to allocate more
# locked memory than the default limit. Unfortunately, user processes on login nodes
# may have a small memory limit (check it by ulimit -a) which by default are propagated
# into Slurm jobs and hence cause fabric errors for MPI.
PropagateResourceLimitsExcept=MEMLOCK

ProctrackType=proctrack/cgroup
TaskPlugin=task/cgroup
SwitchType=switch/none
MpiDefault=pmix_v2
ReturnToService=2
GresTypes=gpu
PreemptType=preempt/qos
PreemptMode=REQUEUE
PreemptExemptTime=-1
Prolog=/etc/slurm/prolog.d/*
Epilog=/etc/slurm/epilog.d/*

# Federation
FederationParameters=fed_display

3. Slurm DB Deployment

3.a. Secrets

Assuming you have deployed LDAP and MySQL, we will store the slurmdbd.conf inside a secret:

  1. Create a -secret.yaml.local file:
argo/slurm-cluster/secrets/slurmdbd-conf-secret.yaml.local
apiVersion: v1
kind: Secret
metadata:
name: slurmdbd-conf-secret
namespace: slurm-cluster
type: Opaque
stringData:
slurmdbd.conf: |
# See https://slurm.schedmd.com/slurmdbd.conf.html
### Main
DbdHost=slurm-cluster-<FILL ME: cluster name>-db-0
SlurmUser=slurm

### Logging
DebugLevel=debug5 # optional, defaults to 'info'. Possible values: fatal, error, info, verbose, debug, debug[2-5]
LogFile=/var/log/slurm/slurmdbd.log
PidFile=/var/run/slurmdbd.pid
LogTimeFormat=thread_id

AuthAltTypes=auth/jwt
AuthAltParameters=jwt_key=/var/spool/slurm/jwt_hs256.key

### Database server configuration
StorageType=accounting_storage/mysql
StorageHost=<FILL ME>
StorageUser=<FILL ME>
StoragePass=<FILL ME>

Replace the <FILL ME> according to your existing configuration.

  1. Seal the secret:
user@local:/ClusterFactory
cfctl kubeseal
  1. Apply the SealedSecret:
user@local:/ClusterFactory
kubectl apply -f argo/slurm-cluster/secrets/slurmdbd-conf-sealed-secret.yaml

3.b. Values: Enable SLURM DB

Edit the elm/slurm-cluster/values-<cluster name>.yaml values

Let's add the values to deploy a SLURM DB.

helm/slurm-cluster/values-<cluster name>.yaml
db:
enabled: true

config:
secretName: slurmdbd-conf-secret

If you are using LDAPS and the CA is private:

helm/slurm-cluster/values-<cluster name>.yaml
db:
enabled: true

config:
secretName: slurmdbd-conf-secret

command: ['sh', '-c', 'update-ca-trust && /init']

volumeMounts:
- name: ca-cert
mountPath: /etc/pki/ca-trust/source/anchors/example.com.ca.pem
subPath: example.com.ca.pem

volumes:
- name: ca-cert
secret:
secretName: local-ca-secret

local-ca-secret is a Secret containing example.com.ca.pem.

You can already deploy it:

git add .
git commit -m "Added SLURM DB values"
git push
user@local:/ClusterFactory
# This is optional if the application is already deployed.
kubectl apply -f argo/slurm-cluster/apps/slurm-cluster-<cluster name>-app.yaml

The service should be accessible at the address slurm-cluster-<cluster name>-db-0.slurm-cluster.svc.cluster.local. Use that URL in the slurm config.

4. Slurm Controller Deployment

4.a. Values: Enable SLURM Controller

Let's add the values to deploy a SLURM Controller.

helm/slurm-cluster/values-<cluster name>.yaml
controller:
enabled: true

persistence:
storageClassName: 'dynamic-nfs'
accessModes: ['ReadWriteOnce']
size: 10Gi

nodeSelector:
topology.kubernetes.io/region: <FILL ME> # <country code>-<city>
topology.kubernetes.io/zone: <FILL ME> # <country code>-<city>-<index>

resources:
requests:
cpu: '250m'
memory: '1Gi'
limits:
cpu:
memory: '1Gi'

Notice that kubernetes.io/hostname is used, this is because the slurm controller will be using the host network and we don't want to make the slurm controller move around.

We might develop a HA setup in the future version of ClusterFactory.

If you are using LDAPS and the CA is private, append these values:

helm/slurm-cluster/values-<cluster name>.yaml
controller:
# ...
command: ['sh', '-c', 'update-ca-trust && /init']

volumeMounts:
- name: ca-cert
mountPath: /etc/pki/ca-trust/source/anchors/example.com.ca.pem
subPath: example.com.ca.pem

volumes:
- name: ca-cert
secret:
secretName: local-ca-secret

local-ca-secret is a Secret containing example.com.ca.pem.

You can already deploy it:

git add .
git commit -m "Added SLURM Controller values"
git push
user@local:/ClusterFactory
# This is optional if the application is already deployed.
kubectl apply -f argo/slurm-cluster/apps/slurm-cluster-<cluster name>-app.yaml
note

The SLURM controller is in host mode using hostPort so it can communicate with the bare-metal hosts. There is also a SLURM controller Service running for the internal communication with Slurm DB and Slurm Login.

4.c Testing: sinfo from the controller node

You should be able to run a kubectl exec session on the controller node and execute sinfo:

user@local
[user@local /]> kubectl exec -it -n slurm-cluster slurm-cluster-<cluster-name>-controller-0 -c slurm-cluster-<cluster-name>-controller -- bash

[root@slurm-cluster-reindeer-controller-0 /]> sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
main* up infinite 12 down* cn[1-12]

5. Slurm Compute Bare-Metal Deployment

5.a. Build an OS Image with Slurm

We have enabled config-less in the slurm.conf.

We need to build an OS Image with Slurm Daemon installed.

Install these packages:

pmix4
slurm
slurm-contribs
slurm-libpmi
slurm-pam_slurm
slurm-slurmd

libnvidia-container1
libnvidia-container-tools
enroot-hardened
enroot-hardened+caps
nvslurm-plugin-pyxis

from the DeepSquare YUM repository.

5.b. Postscripts

Next, you have to configure a service by using a postscript.

The service:

/etc/systemd/system/slurmd.service
[Unit]
Description=Slurm node daemon
After=network.target munge.service

[Service]
Type=forking
ExecStartPre=/usr/bin/id slurm
Restart=always
RestartSec=3
ExecStart=/usr/sbin/slurmd -d /usr/sbin/slurmstepd --conf-server slurm-cluster-<cluster name>-controller-0.example.com
ExecReload=/bin/kill -HUP $MAINPID
PIDFile=/var/run/slurmd.pid
KillMode=process
LimitNOFILE=51200
LimitMEMLOCK=infinity
LimitSTACK=infinity

[Install]
WantedBy=multi-user.target
warning

Add slurm-cluster-<cluster name>-controller-0.example.com to the CoreDNS configuration.

If you are using a hostPort, put the IP of the Kubernetes host hosting the pod. If you are using a LoadBalancer, put the IP you've given to the LoadBalancer. If you are using IPVLAN, put the IP you've given to the IPVLAN.

A simple postbootscript:

#!/bin/sh -ex
# Copy the CA certificate created for the private-cluster-issuer
cp ./certs/my-ca.prem /etc/pki/ca-trust/source/anchors/my-ca.pem

mkdir -p /var/log/slurm/

cat <<\END | base64 -d >/etc/munge/munge.key
...
END

chmod 600 /etc/munge/munge.key

cat <<\END >/etc/sssd/sssd.conf
...
END

chmod 600 /etc/sssd/sssd.conf

#-- Add enroot extra hooks for PMIx and PyTorch multi-node support
cp /usr/share/enroot/hooks.d/50-slurm-pmi.sh /usr/share/enroot/hooks.d/50-slurm-pytorch.sh /etc/enroot/hooks.d

# Enable Pyxis (container jobs)
cat <<\END >/etc/slurm/plugstack.conf.d/
optional /usr/lib64/slurm/spank_pyxis.so runtime_path=/run/pyxis container_scope=job
END

cat <<\END >/etc/systemd/system/slurmd.service
[Unit]
Description=Slurm node daemon
After=network.target munge.service remote-fs.target
Wants=network-online.target

[Service]
Type=simple
Restart=always
RestartSec=3
ExecStart=/usr/sbin/slurmd -d /usr/sbin/slurmstepd --conf-server <node host IP>
ExecReload=/bin/kill -HUP $MAINPID
PIDFile=/var/run/slurmd.pid
KillMode=process
LimitNOFILE=131072
LimitMEMLOCK=infinity
LimitSTACK=infinity
Delegate=yes

StandardOutput=null
StandardError=null

[Install]
WantedBy=multi-user.target
END

#-- Wait for LDAP
update-ca-trust
systemctl restart sssd
while ! id slurm
do
sleep 1
done

systemctl daemon-reload
systemctl restart munge
systemctl enable slurmd
systemctl start --no-block slurmd

After setup SLURM, you should also:

  • Mount the home directory of the LDAP users (probably like /home/ldap-users)
  • Use the postscript to configure SSSD
  • Use the postscript to import the munge.key
Troubleshooting

In the order:

  1. Check your journal (journalctl) and check the logs.
  2. Stuck in a id slurm loop ?
    • Check the SSSD configuration
    • Check the TLS certificate
    • Check if Traefik and LDAP ports (nc -vz <LB IP> <LDAP PORT>)
  3. SSSD is working (id slurm shows 1501 as UID), but sinfo is crashing
    • Check the health of your SLURM controller pod
    • Check if the ports of the SLURM controller (nc -vz <SLURM CTL IP> 6817)
    • Check if the domain name of the SLURM controller can be received (dig @<DNS SERVER IP> slurm-cluster-<cluster name>-controller-0.example.com)
    • Check the DNS client configuration (/etc/resolv.conf)
  4. Slurm is crashing but not sinfo
    • Check /var/log/slurm/slurm.log

5.c. Reboot the nodes

If the controller is running, the nodes should automatically receive the slurm.conf inside /run/slurm/conf.

6. Slurm Login Deployment

6.a. Secrets and Volumes

SSH Server configuration

The login nodes can be exposed to the external network using Multus CNI and the IPVLAN plugin. This is to expose the srunPortRange and the SSH port.

If you don't plan to use srun and prefer sbatch, we recommend to use a simple Kubernetes Service to expose the login nodes.

Thanks to SSSD, the users can log in to the nodes using SSH using the passwords stored in LDAP.

We have to generate the SSH host keys:

user@local
yes 'y' | ssh-keygen -N '' -f ./ssh_host_rsa_key -t rsa -C login-node
yes 'y' | ssh-keygen -N '' -f ./ssh_host_ecdsa_key -t ecdsa -C login-node
yes 'y' | ssh-keygen -N '' -f ./ssh_host_ed25519_key -t ed25519 -C login-node

6 files will be generated. We will also add our sshd_config.

  1. Create a -secret.yaml.local file:
argo/slurm-cluster/secrets/login-sshd-secret.yaml.local
apiVersion: v1
kind: Secret
metadata:
name: login-sshd-secret
namespace: slurm-cluster
type: Opaque
stringData:
ssh_host_ecdsa_key: |
-----BEGIN OPENSSH PRIVATE KEY-----
<FILL ME>
-----END OPENSSH PRIVATE KEY-----
ssh_host_ecdsa_key.pub: |
ecdsa-sha2-nistp256 <FILL ME>
ssh_host_ed25519_key: |
-----BEGIN OPENSSH PRIVATE KEY-----
<FILL ME>
-----END OPENSSH PRIVATE KEY-----
ssh_host_ed25519_key.pub: |
ssh-ed25519 <FILL ME>
ssh_host_rsa_key: |
-----BEGIN OPENSSH PRIVATE KEY-----
<FILL ME>
-----END OPENSSH PRIVATE KEY-----
ssh_host_rsa_key.pub: |
ssh-rsa <FILL ME>
sshd_config: |
Port 22
AddressFamily any
ListenAddress 0.0.0.0
ListenAddress ::

HostKey /etc/ssh/ssh_host_rsa_key
HostKey /etc/ssh/ssh_host_ecdsa_key
HostKey /etc/ssh/ssh_host_ed25519_key

PermitRootLogin prohibit-password
PasswordAuthentication yes

# Change to yes to enable challenge-response passwords (beware issues with
# some PAM modules and threads)
ChallengeResponseAuthentication no

UsePAM yes

X11Forwarding yes
PrintMotd no
AcceptEnv LANG LC_*

# override default of no subsystems
Subsystem sftp /usr/lib/openssh/sftp-server

AuthorizedKeysCommand /usr/bin/sss_ssh_authorizedkeys
AuthorizedKeysCommandUser root

Replace the <FILL ME> with the values based on the generated files.

  1. Seal the secret:
user@local:/ClusterFactory
cfctl kubeseal
  1. Apply the SealedSecret:
user@local:/ClusterFactory
kubectl apply -f argo/slurm-cluster/secrets/login-sshd-sealed-secret.yaml

Home directory for the LDAP users

If you have configured your LDAP server, you might have to change the homeDirectory to something like /home/ldap-users.

You must mount the home directory of the LDAP users using NFS.

DO NOT use StorageClass since the provisioning is static. We don't want to create a volume per replica. There is only one common volume.

argo/slurm-cluster/volumes/ldap-users-<cluster name>-pv.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
name: ldap-users-<cluster name>-pv
namespace: slurm-cluster
labels:
app: slurm-login
topology.kubernetes.io/region: <FILL ME> # <country code>-<city>
topology.kubernetes.io/zone: <FILL ME> # <country code>-<city>-<index>
spec:
capacity:
storage: 1000Gi
mountOptions:
- hard
- nfsvers=4.1
- noatime
- nodiratime
csi:
driver: nfs.csi.k8s.io
readOnly: false
volumeHandle: <unique id> # uuidgen
volumeAttributes:
server: <FILL ME> # IP or host
share: <FILL ME> # /srv/nfs/k8s/ldap-users
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Retain
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: ldap-users-<cluster name>-pvc
namespace: slurm-cluster
labels:
app: slurm-login
topology.kubernetes.io/region: <FILL ME> # <country code>-<city>
topology.kubernetes.io/zone: <FILL ME> # <country code>-<city>-<index>
spec:
volumeName: ldap-users-<cluster name>-pv
accessModes: [ReadWriteMany]
storageClassName: ''
resources:
requests:
storage: 1000Gi

Apply the PV and PVC:

user@local:/ClusterFactory
kubectl apply -f argo/slurm-cluster/volumes/ldap-users-<cluster name>-pv.yaml

6.b. Values: Enable SLURM Login

helm/slurm-cluster/values-<cluster name>.yaml
login:
enabled: true
replicas: 1

sshd:
secretName: login-sshd-secret

nodeSelector:
topology.kubernetes.io/region: <FILL ME> # <country code>-<city>
topology.kubernetes.io/zone: <FILL ME> # <country code>-<city>-<index>

# Extra volume mounts
volumeMounts:
- name: ldap-users-pvc
mountPath: /home/ldap-users

# Extra volumes
volumes:
- name: ldap-users-pvc
persistentVolumeClaim:
claimName: ldap-users-<cluster name>-pvc

service:
enabled: true
type: ClusterIP
# Use LoadBalancer to expose via MetalLB
# type: LoadBalancer

# annotations:
# metallb.universe.tf/address-pool: slurm-ch-basel-1-pool

# Expose via IPVLAN, can be unstable.
# Using IPVLAN permits srun commands.
net:
enabled: false
# Kubernetes host interface
type: ipvlan
masterInterface: eth0
mode: l2

# https://www.cni.dev/plugins/current/ipam/static/
ipam:
type: static
addresses:
- address: 192.168.0.20/24
gateway: 192.168.0.1

Edit the values accordingly.

warning

Service or IPVLan?

A Kubernetes Service offers a lot of advantages while IPVLan offers a solution to a problem.

It is extremely recommended to use a Kubernetes service to expose your connection node as it provides load balancing and is easy to configure.

Kubernetes LoadBalancer ServiceMultus CNI
A LoadBalancer service provides limited control over networking, as it only provides a single IP address for a Kubernetes service.IPVLan with Multus allows you to have more fine-grained control over networking by enabling you to use multiple network interfaces in a pod, each with its own IP address and route table.
A LoadBalancer service is a simple and straightforward way to expose a Kubernetes service to the internet.Setting up IPVLan with Multus can be more complex than using a simple LoadBalancer service, as it requires more configuration and setup time.
A LoadBalancer service can only expose a set of ports.Using IPVLan with Multus will allow a pod to directly connect to the host network.

As a result, using a Kubernetes LoadBalancer service will render the Slurm srun commands inoperable (although sbatch will work and is the preferred method for job submission). On the other hand, adopting Multus CNI eliminates the load balancing feature, but could lead to instability.

warning

Because k8s-pod-network is the default network, you must write routes to your networks.

For example, if we have two sites 10.10.0.0/24 and 10.10.1.0/24, you would write:

ipam:
type: static
addresses:
- address: 192.168.0.20/24
gateway: 192.168.0.1
routes:
- dst: 10.10.1.0/24

If we kubectl exec to the container and run ip route, you would see:

root@slurm-cluster-e<cluster name>-login-b9b7cd9d5-9ntkn
# ip route
default via 169.254.1.1 dev eth0
10.10.0.0/24 via 10.10.0.1 dev net1
10.10.1.0/24 via 10.10.0.1 dev net1
169.254.1.1 dev eth0 scope link
10.10.0.0/20 via 10.10.0.1 dev net1
10.10.0.0/20 dev net1 proto kernel scope link src 10.10.0.21

The issue is tracked at deepsquare-io/ClusterFactory#29 and projectcalico/calico#5199.

If you are using LDAPS and the CA is private, add these values:

helm/slurm-cluster/values-<cluster name>.yaml
login:
# ...
command: ['sh', '-c', 'update-ca-trust && /init']

volumeMounts:
- name: ldap-users-pvc
mountPath: /home/ldap-users
- name: ca-cert
mountPath: /etc/pki/ca-trust/source/anchors/example.com.ca.pem
subPath: example.com.ca.pem

volumes:
- name: ldap-users-pvc
persistentVolumeClaim:
claimName: ldap-users-<cluster name>-pvc
- name: ca-cert
secret:
secretName: local-ca-secret

local-ca-secret is a Secret containing example.com.ca.pem.

You can deploy it:

git add .
git commit -m "Added SLURM Login values"
git push
user@local:/ClusterFactory
# This is optional if the application is already deployed.
kubectl apply -f argo/slurm-cluster/apps/slurm-cluster-<cluster name>-app.yaml

6.c Testing: Access to a SLURM Login node

Because the container is exposed to the external network, you should be able to ssh directly to the login node.

user@local
ssh user@login-node

If the LDAP User user exists, the login node should be asking for a password.