Commit 56f98ad6 authored by Mani Tofigh's avatar Mani Tofigh

Removed extra slurm files and moved our own version under the stallo's folder…

Removed extra slurm files and moved our own version under the stallo's folder and renamed it to Slurm Jobs instead of Jobs
parent 1d64f878
...@@ -2,6 +2,6 @@ ...@@ -2,6 +2,6 @@
sort: 4 sort: 4
--- ---
# Jobs # Slurm Jobs
source: `{{ page.path }}` source: `{{ page.path }}`
# Batch system # Batch system
The Star system is a resource that is shared between many of users and The Star cluster is a resource that is shared between many of users and
to ensure fair use everyone must do their computations by submitting to ensure fair use everyone must do their computations by submitting
jobs through a batch system that will execute the applications on the jobs through a batch system that will execute the applications on the
available resources. available resources.
The batch system on Star is [SLURM](https://slurm.schedmd.com/) The batch system on Star is [SLURM](https://slurm.schedmd.com/)
(Simple Linux Utility for Resource Management.) (Simple Linux Utility for Resource Management). Read more about SLURM <a href="./slurm_parameter.html">here</a>.
<!-- <a href="https://docs.starhpc.hofstra.io/en/latest/jobs/slurm_parameter.html">here</a>. -->
## Creating a job script ## Creating a job script
......
# SLURM Workload Manager # SLURM Workload Manager
SLURM is the workload manager and job scheduler used for Star. Slurm Workload Manager, or SLURM (Simple Linux Utility for Resource Management), is a free and open-source job scheduler for managing workloads on Linux and Unix-based clusters, such as Star.
There are two ways of starting jobs with SLURM; either interactively There are two ways of starting jobs with SLURM; either interactively
with `srun` or as a script with `sbatch`. with `srun` or as a script with `sbatch`.
...@@ -23,6 +23,7 @@ parameter. Replace \<....\> with the value you want, e.g. ...@@ -23,6 +23,7 @@ parameter. Replace \<....\> with the value you want, e.g.
`--job-name=test-job`. `--job-name=test-job`.
### Basic settings: ### Basic settings:
There is a **Slurm Job Script Generator** at <a href="https://manitofigh.github.io/SlurmJobGeneration" target="_blank">this link</a>, which we suggest you use *after* reading this documentation in order to get a better understanding of how the Slurm directives work and how Slurm scripts are meant to be written.
<table> <table>
<thead> <thead>
...@@ -34,11 +35,11 @@ parameter. Replace \<....\> with the value you want, e.g. ...@@ -34,11 +35,11 @@ parameter. Replace \<....\> with the value you want, e.g.
<tbody> <tbody>
<tr class="odd"> <tr class="odd">
<td>--job-name=&lt;name&gt;</td> <td>--job-name=&lt;name&gt;</td>
<td>Job name to be displayed by for example <code>squeue</code></td> <td>Job name to be displayed by <code>squeue</code></td>
</tr> </tr>
<tr class="even"> <tr class="even">
<td>--output=&lt;path&gt;</td> <td>--output=&lt;path&gt;</td>
<td><div class="line-block">Path to the file where the job (error) <td><div class="line-block">Path to the file where the job
output is written to</div></td> output is written to</div></td>
</tr> </tr>
<tr class="odd"> <tr class="odd">
...@@ -57,13 +58,13 @@ of BEGIN, END, FAIL, REQUEUE or ALL</div></td> ...@@ -57,13 +58,13 @@ of BEGIN, END, FAIL, REQUEUE or ALL</div></td>
| Parameter | Function | | Parameter | Function |
|---------------------------------|----------------------------------------------------------------------------------------------------------------------------| |---------------------------------|----------------------------------------------------------------------------------------------------------------------------|
| --time=\<d-hh:mm:ss\> | Time limit for job. Job will be killed by SLURM after time has run out. Format days-hours:minutes:seconds | | - -time=\<hh:mm:ss\> | Time limit for job. Job will be killed by SLURM after time has run out. Format: hours:minutes:seconds |
| --nodes=\<num_nodes\> | Number of nodes. Multiple nodes are only useful for jobs with distributed-memory (e.g. MPI). | | - -nodes=\<num_nodes\> | Number of nodes. Multiple nodes are only useful for jobs with distributed-memory (e.g. MPI). |
| --mem=\<MB\> | Memory (RAM) per node. Number followed by unit prefix, e.g. 16G | | - -mem=\<MB/GB\> | Memory (RAM) per node. Number followed by unit prefix, e.g. 16G |
| --mem-per-cpu=\<MB\> | Memory (RAM) per requested CPU core | | - -mem-per-cpu=\<MB/GB\> | Memory (RAM) per requested CPU core |
| --ntasks-per-node=\<num_procs\> | Number of (MPI) processes per node. More than one useful only for MPI jobs. Maximum number depends nodes (number of cores) | | - -ntasks-per-node=\<num_procs\> | Number of (MPI) processes per node. More than one useful only for MPI jobs. Maximum number depends nodes (number of cores) |
| --cpus-per-task=\<num_threads\> | CPU cores per task. For MPI use one. For parallelized applications benchmark this is the number of threads. | | - -cpus-per-task=\<num_threads\> | CPU cores per task. For MPI use one. For parallelized applications benchmark this is the number of threads. |
| --exclusive | Job will not share nodes with other running jobs. You will be charged for the complete nodes even if you asked for less. | | - -exclusive | Job will not share nodes with other running jobs. You will be charged for the complete nodes even if you asked for less. |
### Accounting ### Accounting
...@@ -99,9 +100,9 @@ been satified. E.g. --dependency=afterok:123456</td> ...@@ -99,9 +100,9 @@ been satified. E.g. --dependency=afterok:123456</td>
</tr> </tr>
<tr class="odd"> <tr class="odd">
<td>--ntasks-per-core=2</td> <td>--ntasks-per-core=2</td>
<td><blockquote> <td><b>
<p>Enables hyperthreading. Only useful in special circumstances.</p> <p>Enables hyperthreading. Only useful in special circumstances.</p>
</blockquote></td> </b></td>
</tr> </tr>
</tbody> </tbody>
</table> </table>
......
---
sort: 3
---
# Slurm
Slurm Workload Manager, or SLURM (Simple Linux Utility for Resource Management), is a free and open-source job scheduler for managing workloads on Linux and Unix-based clusters, grids, and supercomputers. Slurm is widely used in high-performance computing (HPC) environments, where it is used to manage the allocation of resources such as CPU time, memory, and storage across a large number of compute nodes. Slurm provides tools for users to submit, monitor, and control the execution of their jobs. Other key features include support for parallel and serial job execution, support for job dependencies and job arrays, support for resource reservations and QoS (quality of service), and support for job priority and backfilling. Slurm has a modular design that enables it to be highly configurable to be tailored to meet a wide variety of needs in different environments. It is widely used in academia and industry, and is supported by a large and active community of users and developers.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment