Commit 57b746d7 authored by Alexander Rosenberg's avatar Alexander Rosenberg

Merge branch 'aish-jsro-guide' into master

parents 961c9645 40d9b9a2
......@@ -150,8 +150,6 @@ You can find job script examples at [Submitting jobs]({{site.baseurl}}{% link jo
### When will my job start?
How can I find out when my job will start?
To find out approximately when the job scheduler thinks your job will
start, use the command:
......
......@@ -25,9 +25,9 @@ way, that's almost always how we start to research issues too!
Your subject line should be descriptive. "Problem on Star" is not a
good subject line since it could be valid for basically every support
Email we get. The support staff is a team. The subjects are the first
thing that we see. We would like to be able to classify Email according
to subjects before even opening the Email.
E-mail we get. The support staff is a team. The subjects are the first
thing that we see. We would like to be able to classify E-mails according
to subjects before even opening the E-mail.
## Include the actual commands and error messages
......@@ -45,7 +45,7 @@ down our research on your problem.
Do not send support requests by replying to unrelated issues. Every
issue gets a number and this is the number that you see in the subject
line. Replying to unrelated issues means that your Email gets filed
line. Replying to unrelated issues means that your E-mail gets filed
under the wrong thread and risks being overlooked.
## The XY problem
......@@ -79,7 +79,7 @@ nodes". The request then does not mention whether it worked on one node
or on one core or whether it never worked and that this was the first
attempt. Perhaps the problem has even nothing to do with one or two
nodes. In order to better isolate the problem and avoid wasting time
with many back and forth Emails, please tell us what actually worked so
with many back and forth E-mails, please tell us what actually worked so
far. Tell us what you have tried to isolate the problem. This requires
some effort from you but this is what we expect from you.
......
---
sort: 3
sort: 4
---
# Monitoring Jobs
......
This diff is collapsed.
......@@ -77,7 +77,7 @@ Lines 2-7 are your `SBATCH` directives. These lines are where you specify differ
- `#SBATCH --output=test_job.out`: Used to specify where your output file is generated, and what it's going to be named. In this example, we have not provided a path, but only provided a name. When you use the `--output` directive without specifying a full path, just providing a filename, Slurm will store the output file in the current working directory from which the `sbatch` command was executed.
- `#SBATCH --error=test_job.err`: Functions similar to `--output` except it contains error messages generated during the execution of your job, if any. **The `.err` file is always going to be generated even if your job execution is successful; however, it's going to be empty if there are no errors.**
- `#SBATCH --nodes=1`: Specifies your job to run on one available node. This directive basically tells the scheduler "Run my job on any available node you find, and I don't care which one". **It's also possible to specify the name of the node(s) you'd like to use which we will cover in future examples.**
- `#SBATCH --time=10:00`: This line specifies how long you want your job to run, after it's out the queue and starts execution. In this case, the job will be **terminated** after 10 minutes. Acceptable time formats include `mm`, `mm:ss`, `hh:mm:ss`, `days-hh`, `days-hh:mm` and `days-hh:mm:ss`.
- `#SBATCH --time=10:00`: This line specifies how long you want your job to run once it has started execution. In this example, the job specifies that it only needs to run for up to 10 minutes. The scheduler will **terminate** any jobs that run longer than this value plus a 15 minute grace period. Acceptable time formats include `mm`, `mm:ss`, `hh:mm:ss`, `days-hh`, `days-hh:mm` and `days-hh:mm:ss`. This parameter is also known as the walltime to differentiate it from the CPU time, which takes into acount the number of CPUs consumed. Here is a simple example to explain the difference between walltime and cpu time: If a job is running for one hour using two CPU cores, the walltime is one hour while the cpu-time is 1hr x 2 CPUs = 2 hours.
- `#SBATCH --mem=1G` Specifies the maximum main memory required _per_ node. In this case we set the cap to 1 gigabyte. If you don't use a memory unit, Slurm automatically uses MegaBytes: `#SBATCH --mem=4096` requests 4096MB of RAM. **If you want to request all the memory on a node, you can use** `--mem=0`.
After the last `#SBATCH` directive, commands are ran like any other regular shell script.
......@@ -160,7 +160,7 @@ This option requests 4 GB of memory per allocated CPU.
--mail-user=your_email@example.com
```
This configuration sends an Email to the specified address at the start, completion, and failure of the job.
This configuration sends an email to the specified address at the start, completion, and failure of the job.
### Interactive Job Submission
......
......@@ -14,6 +14,7 @@ Users run many different applications on the cluster based on their needs, such
Containerization is also increasingly popular in HPC it provides isolated environments that allow for the reuse of images for better reproducibility and software portability without the performance impact of other methods or the hastle of manualy installing dependencies. Containers are run using Apptainer (formerly Singularity), a containerization platform similar to Docker with the major difference that it runs under user privileges instead of `root`. Users can deploy images from NGC (NVIDIA GPU Cloud), which provides access to a wide array of pre-built images with GPU-optimized software for diverse applications. Leveraging container images can save a lot of time as users don’t need to set up the software applications from scratch and can just pull and use the NGC images with Apptainer.
## Hardware
### Login Node
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment