feat: add containr Slurm job and docs

This commit is contained in:
Anthony Berg 2025-04-01 14:52:30 +02:00
parent 22563df94f
commit 1056ecea67
2 changed files with 43 additions and 2 deletions

View File

@ -0,0 +1,26 @@
#!/bin/bash -l
#SBATCH --job-name=lumi
#SBATCH --account=project_4650000xx
#SBATCH --time=00:10:00
#SBATCH --partition=dev-g
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --gpus-per-node=8
#SBATCH --output=%x-%j.out
#SBATCH --exclusive
N=$SLURM_JOB_NUM_NODES
echo "--nbr of nodes:", $N
echo "--total nbr of gpus:", $SLURM_NTASKS
MyDir=/project/project_4650000xx
MyApplication=${MyDir}/FiniteVolumeGPU_HIP/mpiTesting.py
Container=${MyDir}/FiniteVolumeGPU_HIP/my_container.sif
CPU_BIND="map_cpu:49,57,17,25,1,9,33,41"
export MPICH_GPU_SUPPORT_ENABLED=1
srun --cpu-bind=${CPU_BIND} --mpi=pmi2 \
apptainer exec "${Container}" \
python ${MyApplication} -nx 1024 -ny 1024 --profile

View File

@ -17,19 +17,34 @@ conda-containerize new --prefix MyCondaEnv conda_environment_lumi.yml
where the file `conda_environment_lumi.yml` contains packages to be installed. where the file `conda_environment_lumi.yml` contains packages to be installed.
### Step 1 alternative: Convert to a singularity container with cotainr ### Step 1 alternative: Convert to a singularity container with cotainr
Load the required modules first
```shell
ml CrayEnv
ml cotainr
``` ```
Then build the Singularity/Apptainer container
```shell
cotainr build my_container.sif --system=lumi-g --conda-env=conda_environment_lumi.yml cotainr build my_container.sif --system=lumi-g --conda-env=conda_environment_lumi.yml
``` ```
### Step 2: Modify Slurm Job file ### Step 2: Modify Slurm Job file
Update the contents of [`Jobs/job_lumi.slurm`](Jobs/job_lumi.slurm) to match your project allocation, Depending on your build method, update [`Jobs/job_lumi.slurm`](Jobs/job_lumi.slurm) if `conda-containerize` was used, or [`Jobs/job_apptainer_lumi.slurm`](Jobs/job_apptainer_lumi.slurm) if `containr` was used.
and the directories of where the simulator and Conda container is stored.
In the job file, the required changes is to match your project allocation,
and the directories of where the simulator and container is stored.
### Step 3: Run the Slurm Job ### Step 3: Run the Slurm Job
If `conda-containerize` was used for building:
```shell ```shell
sbatch Jobs/job_lumi.slurm sbatch Jobs/job_lumi.slurm
``` ```
Otherwise, if `containr` was used for building:
```shell
sbatch Jobs/job_apptainer_lumi.slurm
```
### Troubleshooting ### Troubleshooting
#### Error when running MPI. #### Error when running MPI.