Commit b5e2d0f8 authored by hucscsys's avatar hucscsys

basic slurm guide

parent 697543c1
# file1
source: `{{ page.path }}`
# file2
source: `{{ page.path }}`
# file3
source: `{{ page.path }}`
# Submitting Jobs
In High-Performance Computing (HPC) environments, jobs are submitted to a job scheduler for dispatch and execution on the cluster. Jobs are typically non-interactive and may be queued for batch processing, based on demand and resource availability.
This guide provides the basic steps to submit a job to the cluster and monitor its status using Slurm.
## 1. Creating a Job Script
A job script is a shell script containing directives and commands that tell the job scheduler (like SLURM) how to run your job. It typically includes:
- **Resource Specifications**: Indicate the resources needed (like the number of nodes, CPUs per node, memory, and runtime).
- **Environment Setup**: Load necessary modules or set environment variables.
- **Execution Commands**: The actual commands to run your job.
Here's an example job script (named `example_job.sh`):
```bash
#!/bin/bash
#SBATCH --job-name=my_test_job
#SBATCH --output=result.txt
#SBATCH --ntasks=1
#SBATCH --time=10:00
#SBATCH --mem-per-cpu=1000
module load python/3.8
python my_script.py
```
Explanation:
- `#!/bin/bash`: This line indicates that the script should be run in the bash shell.
- `#SBATCH --job-name`: Sets the name of the job.
- `#SBATCH --output`: Specifies where to write the job's standard output.
- `#SBATCH --ntasks`: Number of tasks. In this case, it's a single-task job.
- `#SBATCH --time`: The maximum time for the job (here, 10 minutes).
- `#SBATCH --mem-per-cpu`: Memory per CPU in megabytes.
- `module load python/3.8`: Loads the Python module.
- `python my_script.py`: The command to run your Python script.
You should adjust the resource specifications based on your job's requirements and policies of the cluster.
**Modules and Environment**: The environment setup in the script, e.g. via `module load`, depends on the software and modules available on the cluster.
## 2. Submitting the Job using `sbatch`
To submit the job, use the `sbatch` command:
```bash
sbatch example_job.sh
```
This command sends your job script to the SLURM scheduler, which will queue it for execution based on the available resources and your script's resource requirements.
## 3. Monitoring the Job
After submitting, you can monitor your job's status and view your job queue.
- **Check Job Status**: Use the `squeue` command to see all running and queued jobs. To see only your jobs:
```bash
squeue -u your_username
```
- **View Job Information**: To get more detailed information about a specific job, use:
```bash
scontrol show job your_job_id
```
- **Canceling a Job**: If you need to cancel a job, use:
```bash
scancel your_job_id
```
- **Checking Output**: The output of your job (if any) will be written to the file specified in the `--output` directive of your script (in this case, `result.txt`).
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment