feat: add containr Slurm job and docs

This commit is contained in:
Anthony Berg 2025-04-01 14:52:30 +02:00
parent 22563df94f
commit 1056ecea67
2 changed files with 43 additions and 2 deletions

View File

@ -0,0 +1,26 @@
#!/bin/bash -l
#SBATCH --job-name=lumi
#SBATCH --account=project_4650000xx
#SBATCH --time=00:10:00
#SBATCH --partition=dev-g
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --gpus-per-node=8
#SBATCH --output=%x-%j.out
#SBATCH --exclusive
N=$SLURM_JOB_NUM_NODES
echo "--nbr of nodes:", $N
echo "--total nbr of gpus:", $SLURM_NTASKS
MyDir=/project/project_4650000xx
MyApplication=${MyDir}/FiniteVolumeGPU_HIP/mpiTesting.py
Container=${MyDir}/FiniteVolumeGPU_HIP/my_container.sif
CPU_BIND="map_cpu:49,57,17,25,1,9,33,41"
export MPICH_GPU_SUPPORT_ENABLED=1
srun --cpu-bind=${CPU_BIND} --mpi=pmi2 \
apptainer exec "${Container}" \
python ${MyApplication} -nx 1024 -ny 1024 --profile

View File

@ -17,19 +17,34 @@ conda-containerize new --prefix MyCondaEnv conda_environment_lumi.yml
where the file `conda_environment_lumi.yml` contains packages to be installed.
### Step 1 alternative: Convert to a singularity container with cotainr
Load the required modules first
```shell
ml CrayEnv
ml cotainr
```
Then build the Singularity/Apptainer container
```shell
cotainr build my_container.sif --system=lumi-g --conda-env=conda_environment_lumi.yml
```
### Step 2: Modify Slurm Job file
Update the contents of [`Jobs/job_lumi.slurm`](Jobs/job_lumi.slurm) to match your project allocation,
and the directories of where the simulator and Conda container is stored.
Depending on your build method, update [`Jobs/job_lumi.slurm`](Jobs/job_lumi.slurm) if `conda-containerize` was used, or [`Jobs/job_apptainer_lumi.slurm`](Jobs/job_apptainer_lumi.slurm) if `containr` was used.
In the job file, the required changes is to match your project allocation,
and the directories of where the simulator and container is stored.
### Step 3: Run the Slurm Job
If `conda-containerize` was used for building:
```shell
sbatch Jobs/job_lumi.slurm
```
Otherwise, if `containr` was used for building:
```shell
sbatch Jobs/job_apptainer_lumi.slurm
```
### Troubleshooting
#### Error when running MPI.