@@ -77,7 +77,7 @@ Lines 2-7 are your `SBATCH` directives. These lines are where you specify differ
-`#SBATCH --output=test_job.out`: Used to specify where your output file is generated, and what it's going to be named. In this example, we have not provided a path, but only provided a name. When you use the `--output` directive without specifying a full path, just providing a filename, Slurm will store the output file in the current working directory from which the `sbatch` command was executed.
-`#SBATCH --error=test_job.err`: Functions similar to `--output` except it contains error messages generated during the execution of your job, if any. **The `.err` file is always going to be generated even if your job execution is successful; however, it's going to be empty if there are no errors.**
-`#SBATCH --nodes=1`: Specifies your job to run on one available node. This directive basically tells the scheduler "Run my job on any available node you find, and I don't care which one". **It's also possible to specify the name of the node(s) you'd like to use which we will cover in future examples.**
-`#SBATCH --time=10:00`: This line specifies how long you want your job to run, after it's out the queue and starts execution. In this case, the job will be **terminated** after 10 minutes. Acceptable time formats include `mm`, `mm:ss`, `hh:mm:ss`, `days-hh`, `days-hh:mm` and `days-hh:mm:ss`.
-`#SBATCH --time=10:00`: This line specifies how long you want your job to run once it has started execution. In this example, the job specifies that it only needs to run for up to 10 minutes. The scheduler will **terminate** any jobs that run longer than this value plus a 15 minute grace period. Acceptable time formats include `mm`, `mm:ss`, `hh:mm:ss`, `days-hh`, `days-hh:mm` and `days-hh:mm:ss`. This parameter is also known as the walltime to differentiate it from the CPU time, which takes into acount the number of CPUs consumed. Here is a simple example to explain the difference between walltime and cpu time: If a job is running for one hour using two CPU cores, the walltime is one hour while the cpu-time is 1hr x 2 CPUs = 2 hours.
-`#SBATCH --mem=1G` Specifies the maximum main memory required _per_ node. In this case we set the cap to 1 gigabyte. If you don't use a memory unit, Slurm automatically uses MegaBytes: `#SBATCH --mem=4096` requests 4096MB of RAM. **If you want to request all the memory on a node, you can use**`--mem=0`.
After the last `#SBATCH` directive, commands are ran like any other regular shell script.
...
...
@@ -160,7 +160,7 @@ This option requests 4 GB of memory per allocated CPU.
--mail-user=your_email@example.com
```
This configuration sends an Email to the specified address at the start, completion, and failure of the job.
This configuration sends an email to the specified address at the start, completion, and failure of the job.
@@ -14,6 +14,7 @@ Users run many different applications on the cluster based on their needs, such
Containerization is also increasingly popular in HPC it provides isolated environments that allow for the reuse of images for better reproducibility and software portability without the performance impact of other methods or the hastle of manualy installing dependencies. Containers are run using Apptainer (formerly Singularity), a containerization platform similar to Docker with the major difference that it runs under user privileges instead of `root`. Users can deploy images from NGC (NVIDIA GPU Cloud), which provides access to a wide array of pre-built images with GPU-optimized software for diverse applications. Leveraging container images can save a lot of time as users don’t need to set up the software applications from scratch and can just pull and use the NGC images with Apptainer.