Commit 78d673af authored by Mani Tofigh's avatar Mani Tofigh

1) Added file submitting-jobs.md and added an intro to it. 2) Added Jupyter…

1) Added file submitting-jobs.md and added an intro to it. 2) Added Jupyter Notebook Example for Batch script samples
parent 95c63b6b
...@@ -71,5 +71,50 @@ After the last `#SBATCH` directive, commands are ran like any other regular shel ...@@ -71,5 +71,50 @@ After the last `#SBATCH` directive, commands are ran like any other regular shel
* `module load python3`: Loads necessary files and modules in order for the command `python3` to be valid when used. Please refer to `/software/env-modules.html` for more detail on how the command `module` works. * `module load python3`: Loads necessary files and modules in order for the command `python3` to be valid when used. Please refer to `/software/env-modules.html` for more detail on how the command `module` works.
* `python3 my_script.py`: Just like any other `python3` command, this line runs the `my_script.py` file using Python. **Later, the output(s) and/or error(s) of this operation is written to the files we have specified in our directives.** * `python3 my_script.py`: Just like any other `python3` command, this line runs the `my_script.py` file using Python. **Later, the output(s) and/or error(s) of this operation is written to the files we have specified in our directives.**
### More advanced batch job example #### Submit the job
This script as discussed previously, is a non-interactive job. Non-interactive jobs are submitted to the queue with the use of the `sbatch` command. In this case, we submit our job using `sbatch my_script.sbatch`.
### Jupyter Notebook batch job example
As you know, there is no Graphical User Interface (GUI) available when you connect to the cluster through your shell, hence in order to have access to some application's GUI, port fortforwarding is necessary [(What is SSH port forwarding?)](https://www.youtube.com/watch?v=x1yQF1789cE&ab_channel=TonyTeachesTech). In this example, we will do port forwarding to access Jupyter Notebook's web portal. You will basically send and receive your data through a specified port on your local machine that is tunneled to the port on the cluster where the Jupyter Notebook server is running. This setup enables you to work with Jupyter Notebooks as if they were running locally on your machine, despite actually being executed on a remote cluster node. After a successful setup, you can access Jupyter's portal through your desired browser through a generated link by Jupyter **on your local machine**.
Create your sbatch script file. I'm going to call mine `jupyterTest.sbatch`. Then add the following to it:
```bash
#!/bin/bash
#SBATCH --nodelist=cn01
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --time=00:30:00
#SBATCH --job-name=jupyterTest1
#SBATCH --output=/home/mani/outputs/jupyterTest1.out
#SBATCH --error=/home/mani/outputs/jupyterTest1.err
# get tunneling info
XDG_RUNTIME_DIR=""
node=$(hostname -s)
user=$(whoami)
port=9001
# print tunneling instructions to jupyterTest1
echo -e "
Command to create ssh tunnel:
ssh -p5010 -N -f -L ${port}:${node}:${port} ${user}@binary.star.hofstra.edu
Use a Browser on your local machine to go to:
localhost:${port}"
module load jupyter
# Run Jupyter
jupyter notebook --no-browser --port=${port} --ip=${node}
```
Most of the directives at the start of the script have previously been discussed at "Basic batch job example", so we are only going to discuss the new ones:
* `--ntasks=1`: This directive tells SLURM to allocate resources for one task. A "task" in this context is essentially an instance of your application or script running on the cluster. For many applications, especially those that don't explicitly parallelize their workload across multiple CPUs or nodes, specifying a single task is sufficient. However, if you're running applications that can benefit from parallel execution, you might increase this number. This directive is crucial for optimizing resource usage based on the specific needs of your job. For instance, running multiple independent instances of a data analysis script on different subsets of your data could be a scenario where increasing the number of tasks is beneficial.
* `--cpus-per-task=1`: This sets the number of CPUs allocated to each task specified by `--ntasks`. By default, setting it to 1 assigns one CPU to your task, which is fine for tasks that are not CPU-intensive or designed to run on a single thread. However, for applications that are multi-threaded and can utilize more than one CPU core for processing, you would increase this value to match the application's capability to parallelize its workload.
Don't worry, we will discuss parallel and multi-threaded jobs in more detail throughout this documentation with real-world examples.
---
sort: 3
---
# Submitting Jobs
In `/jobs/creating-jobs.md`, we briefly touched on how to submit specific job types with the use of commands like `sbatch`, `srun`, etc. Here, we are going to focus on *how* to benefit the most out of your job submissions:
* What can help your jobs leave the queue faster.
* How Scheduler (Slurm) Policies affect your job.
...@@ -45,7 +45,7 @@ The cluster also supports various software applications tailored to different ne ...@@ -45,7 +45,7 @@ The cluster also supports various software applications tailored to different ne
| Cores per Socket | 32 | | Cores per Socket | 32 |
| Threads per Core | 2 | | Threads per Core | 2 |
| Memory | 256GiB Total Memory (16 x 16GiB DIMM DDR4)| | Memory | 256GiB Total Memory (16 x 16GiB DIMM DDR4)|
| Local Storage | 854G | | Local Storage (Scratch Space) | 854G |
### Storage System ### Storage System
Our storage system contains of four HPE PFSS nodes, collectively offering a total of 63TB of storage. You can think of these four nodes as one unified 63TB storage unit as it is a **Parallel File System Storage** component. These nodes work in parallel and are all mounted under **one** mount point on the gpu nodes only (`/fs1`). Our storage system contains of four HPE PFSS nodes, collectively offering a total of 63TB of storage. You can think of these four nodes as one unified 63TB storage unit as it is a **Parallel File System Storage** component. These nodes work in parallel and are all mounted under **one** mount point on the gpu nodes only (`/fs1`).
......
...@@ -20,7 +20,7 @@ Before you start using the Star cluster, please read Hofstra University's [Accep ...@@ -20,7 +20,7 @@ Before you start using the Star cluster, please read Hofstra University's [Accep
A general introduction to the Star HPC cluster, research community, and support group be found at [starhpc.hofstra.io](https://starhpc.hofstra.io). A general introduction to the Star HPC cluster, research community, and support group be found at [starhpc.hofstra.io](https://starhpc.hofstra.io).
To be able to work on the Star cluster, you must have an account and you must have been granted CPU time on the system. To be able to work on the Star cluster, you must have an approved account on the cluster.
## How to get an account on Star ## How to get an account on Star
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment