In this section, you will run a Gromacs MPI job distributed between two c5n.18xlarge nodes in your Kubernetes cluster.
Execute the code block below to create a file named gromacs-mpi.yaml
, containing an MPIJob manifest:
cat > ~/environment/gromacs-mpi.yaml << EOF
apiVersion: kubeflow.org/v2beta1
kind: MPIJob
metadata:
name: gromacs-mpi
namespace: gromacs
spec:
slotsPerWorker: 48
runPolicy:
cleanPodPolicy: Running
mpiReplicaSpecs:
Launcher:
replicas: 1
template:
spec:
restartPolicy: OnFailure
volumes:
- name: cache-volume
emptyDir:
medium: Memory
sizeLimit: 2048Mi
- name: data
persistentVolumeClaim:
claimName: fsx-pvc
initContainers:
- image: "${IMAGE_URI}"
name: init
command: ["sh", "-c", "cp /inputs/* /data; sleep 5"]
volumeMounts:
- name: data
mountPath: /data
containers:
- image: "${IMAGE_URI}"
imagePullPolicy: Always
name: gromacs-mpi-launcher
volumeMounts:
- name: cache-volume
mountPath: /dev/shm
- name: data
mountPath: /data
env:
- name: OMPI_MCA_verbose
value: "1"
command:
- /opt/view/bin/mpirun
- --allow-run-as-root
- --oversubscribe
- -x
- FI_LOG_LEVEL=warn
- -x
- FI_PROVIDER=shm,sockets
- -np
- "48"
- -npernode
- "48"
- --bind-to
- "core"
- /opt/view/bin/gmx_mpi
- mdrun
- -ntomp
- "1"
- -deffnm
- "/data/md_0_1"
- -s
- "/data/md_0_1.tpr"
Worker:
replicas: 1
template:
spec:
volumes:
- name: cache-volume
emptyDir:
medium: Memory
sizeLimit: 2048Mi
- name: data
persistentVolumeClaim:
claimName: fsx-pvc
containers:
- image: "${IMAGE_URI}"
imagePullPolicy: Always
name: gromacs-mpi-worker
volumeMounts:
- name: cache-volume
mountPath: /dev/shm
- name: data
mountPath: /data
resources:
limits:
memory: 8000Mi
requests:
memory: 8000Mi
EOF
To launch the GROMACS MPIJob, execute:
kubectl apply -f ~/environment/gromacs-mpi.yaml
Watch the pods in the gromacs namespace until the launcher pod enters Running
state.
kubectl get pods -n gromacs -w
Press Ctrl-c
to exit.
Follow the launcher logs while the pod is in Running state.
kubectl -n gromacs logs -f $(kubectl -n gromacs get pods | grep gromacs-mpi-launcher | head -n 1 | cut -d ' ' -f 1)
You should see GROMACS log entries similar to the ones shown below. Job output will hang on the line 50000 steps, 100.0 ps.
while the simulation runs and will report the rest on simulation completion.
...
50000 steps, 100.0 ps.
Writing final coordinates.
Dynamic load balancing report:
DLB was off during the run due to low measured imbalance.
Average load imbalance: 24.7%.
The balanceable part of the MD step is 56%, load imbalance is computed from this.
Part of the total run time spent waiting due to load imbalance: 13.8%.
Average PME mesh/force load: 1.091
Part of the total run time spent waiting due to PP/PME imbalance: 4.6 %
NOTE: 13.8 % of the available CPU time was lost due to load imbalance
in the domain decomposition.
Dynamic load balancing was automatically disabled, but it might be beneficial to manually tuning it on (option -dlb on.)
You can also consider manually changing the decomposition (option -dd);
e.g. by using fewer domains along the box dimension in which there is
considerable inhomogeneity in the simulated system.
Core t (s) Wall t (s) (%)
Time: 2971.865 41.276 7199.9
(ns/day) (hour/ns)
Performance: 209.325 0.115
Press Ctrl-c
to stop following the log.
While waiting for the GROMACS MPIJob to complete, explore the cluster utilization using either kubectl top node
or the Grafana dashboards.
Example:
kubectl top node
Output:
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
ip-192-168-86-187.ec2.internal 36114m 37% 5275Mi 1%
You should notice increased utilization of the cluster node cores. This is an indication that the MPI Job is running.
To delete the GROMACS MPIJob, after it is completed, execute:
kubectl delete -f ~/environment/gromacs-mpi.yaml
Congratulations, you have successfully run a tightly coupled MPI job on two nodes using GROMACS to simulate a protein (lysozyme) in a box of water with ions.
Once the job is completed the output files are in the FSx volume. To check that we will mount the volume in a new pod and open a shell in that pod.
Execute the code below to create a file named fsx-data.yaml
, containing the pod manifest.
cat > ~/environment/fsx-data.yaml << EOF
apiVersion: v1
kind: Pod
metadata:
name: fsx-data
namespace: gromacs
spec:
containers:
- name: app
image: amazonlinux:2
command: ["/bin/sh"]
args: ["-c", "while true; do date; sleep 5; done"]
volumeMounts:
- name: data
mountPath: /data
volumes:
- name: data
persistentVolumeClaim:
claimName: fsx-pvc
EOF
Create the pod.
kubectl apply -f ~/environment/fsx-data.yaml
Check the pod status.
kubectl -n gromacs get pods
Once the pod is in Running
state, open a shell into it.
kubectl -n gromacs exec -it $(kubectl -n gromacs get pods | grep fsx-data | head -n 1 | cut -d ' ' -f 1) -- bash
Describe the volumes mounted in the pod
df -h
Output:
Filesystem Size Used Avail Use% Mounted on
overlay 30G 6.7G 24G 23% /
tmpfs 64M 0 64M 0% /dev
tmpfs 185G 0 185G 0% /sys/fs/cgroup
192.168.106.172@tcp:/aqbp5bmv 1.1T 16M 1.1T 1% /data
/dev/nvme0n1p1 30G 6.7G 24G 23% /etc/hosts
shm 64M 0 64M 0% /dev/shm
tmpfs 185G 12K 185G 1% /run/secrets/kubernetes.io/serviceaccount
tmpfs 185G 0 185G 0% /proc/acpi
tmpfs 185G 0 185G 0% /sys/firmware
Notice the FSx volume is mounted under /data
.
Check that the output data from the Gromacs MPI job is in the /data
directory.
ls -alh /data
You should see a list of files like the one below:
...
total 9.4M
drwxr-xr-x 3 root root 33K Sep 30 04:29 .
drwxr-xr-x 1 root root 29 Sep 30 04:34 ..
-rw-r--r-- 1 root root 797K Sep 30 04:29 md_0_1.cpt
-rw-r--r-- 1 root root 8.2K Sep 30 04:29 md_0_1.edr
-rw-r--r-- 1 root root 2.3M Sep 30 04:29 md_0_1.gro
-rw-r--r-- 1 root root 36K Sep 30 04:29 md_0_1.log
-rw-r--r-- 1 root root 1.3M Sep 30 04:29 md_0_1.xtc
These are the files that the GROMACS simulations produced:
md_0_1.log
contains the GROMACS run output logs (open it up and inspect).md_0_1.gro
contains the encoded protein structure.md_0_1.xtc
contains particle trajectory information.md_0_1.edr
contains information about physical quantities, like energy, temperature, and pressure.md_0_1*.cpt
contains checkpoint/restore data (can be used to resume the simulation).When you are done inspecting the data files, exit the data pod shell.
exit
If the output data is copied to a linux desktop, it can be visualized with the VMD tool by using the following command (not covered here):
/usr/local/bin/vmd md_0_1.gro md_0_1.xtc
Visualized, the output looks like the image below and can be displayed as a movie over time: