Commit e99d3f82 authored by Mani Tofigh's avatar Mani Tofigh

1) Deleted jobs/job-submission.md as it was from the older version and now it's…

1) Deleted jobs/job-submission.md as it was from the older version and now it's being splitted into different files. 2) Updated writing-support-request.md
parent 54ebcab5
......@@ -23,7 +23,7 @@ as there are other systems that are managed by us.
## Please do not treat us as "Let me Google that for you" assistants
Have you searched the internet with the exact error message
and the name of your application...? Other scientists may have had the
and the name of your application? Other scientists may have had the
very same problem and might have been successful in solving it. By the
way, that's almost always how we start to research issues too!
......@@ -31,30 +31,27 @@ way, that's almost always how we start to research issues too!
Your subject line should be descriptive. "Problem on Star" is not a
good subject line since it could be valid for basically every support
email we get. The support staff is a team. The subjects are the first
thing that we see. We would like to be able to classify emails according
to subjects before even opening the email.
E-mail we get. The support staff is a team. The subjects are the first
thing that we see. We would like to be able to classify E-mails according
to subjects before even opening the E-mail.
## Include the actual commands and error messages
We cover this below, but it's so important it needs to be mentioned at
the top, too: include the actual command you run and actual error
messages. Copy and paste. If you don't include this, we will be slightly
annoyed and immediately ask this.
the top, too: <br>
Include the actual commands you run and the actual error
messages you receive. Copy and paste please. If you don't include this, we have to immediately ask you for it before proceeding with the issue.
Please, do not screen shoot your ssh terminal and send us pictures (jpg,
png, tiff...) of what you saw on your monitor! From these, we would be
unable to cut & paste commands or error messages, unnecessarily slowing
down our research on your problem. Your sample output does not need at
all to "look good", and we don't need to know what fancy ssh- or
terminal software you have: a simple text-based cut & paste directly
into the mail or ticket is the best we can work on with.
Please, do not screenshot your ssh terminal and send us pictures (jpg,
png, tiff, etc) of what you saw on your monitor. From these, we would be
unable to copy and paste commands or error messages, unnecessarily slowing
down our research on your problem.
## New problem--new email
## New problem--new E-mail
Do not send support requests by replying to unrelated issues. Every
issue gets a number and this is the number that you see in the subject
line. Replying to unrelated issues means that your email gets filed
line. Replying to unrelated issues means that your E-mail gets filed
under the wrong thread and risks being overlooked.
## The XY problem
......@@ -88,7 +85,7 @@ nodes". The request then does not mention whether it worked on one node
or on one core or whether it never worked and that this was the first
attempt. Perhaps the problem has even nothing to do with one or two
nodes. In order to better isolate the problem and avoid wasting time
with many back and forth emails, please tell us what actually worked so
with many back and forth E-mails, please tell us what actually worked so
far. Tell us what you have tried to isolate the problem. This requires
some effort from you but this is what we expect from you.
......
---
sort: 100
---
# Creating and Submitting Jobs
In High-Performance Computing (HPC) environments, jobs are submitted to a job scheduler for dispatch and execution on the cluster. Jobs are typically non-interactive and may be queued for batch processing, based on demand and resource availability.
This guide provides the basic steps to submit a job to the cluster and monitor its status using Slurm.
## 1. Creating a Job Script
A job script is a shell script containing directives and commands that tell the job scheduler (like SLURM) how to run your job. It typically includes:
- **Resource Specifications**: Indicate the resources needed (like the number of nodes, CPUs per node, memory, and runtime).
- **Environment Setup**: Load necessary modules or set environment variables.
- **Execution Commands**: The actual commands to run your job.
Here's an example job script (named `example_job.sh`):
```bash
#!/bin/bash
#SBATCH --job-name=my_test_job
#SBATCH --output=result.txt
#SBATCH --ntasks=1
#SBATCH --time=10:00
#SBATCH --mem-per-cpu=1000
module load python/3.8
python my_script.py
```
Explanation:
- `#!/bin/bash`: This line indicates that the script should be run in the bash shell.
- `#SBATCH --job-name`: Sets the name of the job.
- `#SBATCH --output`: Specifies where to write the job's standard output.
- `#SBATCH --ntasks`: Number of tasks. In this case, it's a single-task job.
- `#SBATCH --time`: The maximum time for the job (here, 10 minutes).
- `#SBATCH --mem-per-cpu`: Memory per CPU in megabytes.
- `module load python/3.8`: Loads the Python module.
- `python my_script.py`: The command to run your Python script.
You should adjust the resource specifications based on your job's requirements and policies of the cluster.
**Modules and Environment**: The environment setup in the script, e.g. via `module load`, depends on the software and modules available on the cluster.
You can use our [SLURM Job Script Generator](https://manitofigh.github.io/SlurmJobGeneration/) to help you build the script you need.
## 2. Submitting the Job using `sbatch`
To submit the job, use the `sbatch` command:
```bash
sbatch example_job.sh
```
This command sends your job script to the SLURM scheduler, which will queue it for execution based on the available resources and your script's resource requirements.
## 3. Monitoring the Job
After submitting, you can monitor your job's status and view your job queue.
- **Check Job Status**: Use the `squeue` command to see all running and queued jobs. To see only your jobs:
```bash
squeue -u your_username
```
- **View Job Information**: To get more detailed information about a specific job, use:
```bash
scontrol show job your_job_id
```
- **Canceling a Job**: If you need to cancel a job, use:
```bash
scancel your_job_id
```
- **Checking Output**: The output of your job (if any) will be written to the file specified in the `--output` directive of your script (in this case, `result.txt`).
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment