basic slurm guide

b5e2d0f8 · hucscsys · 697543c1 · 697543c1 · 697543c1 · 697543c1
Commit b5e2d0f8 authored Dec 20, 2023 by hucscsys
Hide whitespace changes
Inline Side-by-side

Showing with 78 additions and 9 deletions

file1.md guides/slurm/file1.md +0 -3

file2.md guides/slurm/file2.md +0 -3

file3.md guides/slurm/file3.md +0 -3

job-submission.md guides/slurm/job-submission.md +78 -0

No files found.
--- a/guides/slurm/file1.md
+++ b/guides/slurm/file1.md
-# file1
-
-source: `{{ page.path }}`
--- a/guides/slurm/file2.md
+++ b/guides/slurm/file2.md
-# file2
-
-source: `{{ page.path }}`
--- a/guides/slurm/file3.md
+++ b/guides/slurm/file3.md
-# file3
-
-source: `{{ page.path }}`
--- a/guides/slurm/job-submission.md
+++ b/guides/slurm/job-submission.md
+# Submitting Jobs
+
+In High-Performance Computing (HPC) environments, jobs are submitted to a job scheduler for dispatch and execution on the cluster. Jobs are typically non-interactive and may be queued for batch processing, based on demand and resource availability.
+
+This guide provides the basic steps to submit a job to the cluster and monitor its status using Slurm.
+
+## 1. Creating a Job Script
+
+A job script is a shell script containing directives and commands that tell the job scheduler (like SLURM) how to run your job. It typically includes:
+
+- **Resource Specifications**: Indicate the resources needed (like the number of nodes, CPUs per node, memory, and runtime).
+- **Environment Setup**: Load necessary modules or set environment variables.
+- **Execution Commands**: The actual commands to run your job.
+
+Here's an example job script (named `example_job.sh`):
+
+```bash
+#!/bin/bash
+
+#SBATCH --job-name=my_test_job
+#SBATCH --output=result.txt
+#SBATCH --ntasks=1
+#SBATCH --time=10:00
+#SBATCH --mem-per-cpu=1000
+
+module load python/3.8
+python my_script.py
+```
+
+Explanation:
+- `#!/bin/bash`: This line indicates that the script should be run in the bash shell.
+- `#SBATCH --job-name`: Sets the name of the job.
+- `#SBATCH --output`: Specifies where to write the job's standard output.
+- `#SBATCH --ntasks`: Number of tasks. In this case, it's a single-task job.
+- `#SBATCH --time`: The maximum time for the job (here, 10 minutes).
+- `#SBATCH --mem-per-cpu`: Memory per CPU in megabytes.
+- `module load python/3.8`: Loads the Python module.
+- `python my_script.py`: The command to run your Python script.
+
+You should adjust the resource specifications based on your job's requirements and policies of the cluster.
+
+**Modules and Environment**: The environment setup in the script, e.g. via `module load`, depends on the software and modules available on the cluster.
+
+## 2. Submitting the Job using `sbatch`
+
+To submit the job, use the `sbatch` command:
+
+```bash
+sbatch example_job.sh
+```
+
+This command sends your job script to the SLURM scheduler, which will queue it for execution based on the available resources and your script's resource requirements.
+
+## 3. Monitoring the Job
+
+After submitting, you can monitor your job's status and view your job queue.
+
+- **Check Job Status**: Use the `squeue` command to see all running and queued jobs. To see only your jobs:
+
+  ```bash
+  squeue -u your_username
+  ```
+
+- **View Job Information**: To get more detailed information about a specific job, use:
+
+  ```bash
+  scontrol show job your_job_id
+  ```
+
+- **Canceling a Job**: If you need to cancel a job, use:
+
+  ```bash
+  scancel your_job_id
+  ```
+
+- **Checking Output**: The output of your job (if any) will be written to the file specified in the `--output` directive of your script (in this case, `result.txt`).
+
+