Commit fca7075b authored by Mani Tofigh's avatar Mani Tofigh

| Original Feedback | Implementation Status |

|------------------|----------------------|
| Please change the actual name of the login node to a placeholder |  Changed to `<login-node>` |
| Decouple it from the batch submission guide |  Removed references to "Basic batch job example" |
| Put the content on SSH port forwarding into its own section |  Created dedicated "SSH Port Forwarding" section with subsections |
| Add instructions on how to do SSH port forwarding through an Adams 204 machine |  Added complete SSH tunnel command for Linux lab machines |
| Remind the user that the compute nodes themselves have no Internet access |  Added reminder about file transfer through login node |
| Add a link on the submitting jobs guide to the new Jupyter Notebook page |  Added link under "Interactive jobs" section |
| Briefly explain the use of srun for launching an interactive shell |  Added "Working on the Same Node" section with `srun` command |
| Add a sentence about /home/<username>/ limitations |  Added "Storage Space and Performance" section |
| Explain that additional packages can be installed through conda |  will do this after the addition of the Condo page |
| Mention that using Docker images is also an option |  Added Container Images section with Apptainer link at the bottom |
| Please confirm the example job script is actually tested and works as expected |  I have to retry this as I previously only tested it with the VPN and not through the LL machines. |
parent dd4a7d2c
......@@ -93,6 +93,8 @@ You can find more job examples where we run TensorFlow and PyTorch containers at
## Interactive jobs
For setting up interactive Jupyter Notebook sessions, see our [Jupyter Notebooks Guide]({{site.baseurl}}{% link software/jupyter-notebooks.md %}).
### Starting an Interactive job
To start an interactive job, you use the `srun` command with specific parameters that define your job's resource requirements. Here's an example:
......
# Jupyter Notebooks
Jupyter notebooks are interactive web-based environments that allow you to create and share documents containing live code, equations, visualizations, and narrative text. They're particularly useful for data analysis, scientific computing, and machine learning tasks, as they enable you to execute Python code in cells, see the results immediately, and document your workflow all in one place.
Jupyter notebooks are interactive web-based environments where you can create and share documents with live code, equations, visualizations, and narrative text. They're great for data analysis, scientific computing, and machine learning tasks - you can run Python code in cells, see results right away, and document your work all in one place.
## Jupyter Notebook job example
## Storage Space and Performance
As you may know, there is no Graphical User Interface (GUI) available when you connect to the cluster through your shell, hence in order to have access to some application's GUI, port forwarding is necessary [(What is SSH port forwarding?)](https://www.youtube.com/watch?v=x1yQF1789cE&ab_channel=TonyTeachesTech). In this example, we will do port forwarding to access Jupyter Notebook's web portal. You will basically send and receive your data through a specified port on your local machine that is tunneled to the port on the cluster where the Jupyter Notebook server is running. This setup enables you to work with Jupyter Notebooks as if they were running locally on your machine, despite actually being executed on a remote cluster node. After a successful setup, you can access Jupyter's portal through your desired browser through a generated link by Jupyter **on your local machine**.
While you can use `/home/<username>/` for quick experiments and individual projects, keep in mind that this path has limited storage space and performance. For most of your work, use `/fs1/projects/<project-name>/` which lives on the parallel file-system storage. It's faster and has much more space for your notebooks and data.
First, create your `sbatch` script file. I'm going to call mine `jupyterTest.sbatch`. Then add the following to it:
## SSH Port Forwarding
Since the cluster's compute nodes don't have a graphical interface, we'll need to use SSH port forwarding to access Jupyter's web portal. Since the login node is accessible from the Linux lab machines, originally hosted in Adams Hall, we'll create a tunnel through these machines to reach our Jupyter session.
1. The job script (shown in the next section) will generate a tunneling command in your output file
2. Run this command from your local machine to establish the connection through the Linux lab machine
3. Access Jupyter through your local web browser
## Running Jupyter Notebook
Here's a script to launch Jupyter. Save it as `jupyter.sbatch`:
```bash
#!/bin/bash
#SBATCH --nodelist=cn01
#SBATCH --nodelist=<compute-node>
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --time=00:30:00
#SBATCH --job-name=jupyterTest1
#SBATCH --output=/home/mani/outputs/jupyterTest1.out
#SBATCH --error=/home/mani/outputs/jupyterTest1.err
#SBATCH --job-name=jupyter_notebook
#SBATCH --output=/fs1/projects/<project-name>/jupyter_%j.out
#SBATCH --error=/fs1/projects/<project-name>/jupyter_%j.err
# get tunneling info
XDG_RUNTIME_DIR=""
node=$(hostname -s)
user=$(whoami)
port=9001
# print tunneling instructions to jupyterTest1
echo -e "
Use the following command to set up ssh tunneling:
ssh -p5010 -N -f -L ${port}:${node}:${port} ${user}@binary.star.hofstra.edu"
Run this command from your local machine to set up the tunnel:
ssh -L ${port}:localhost:${port} -p 5010 ${user}@adams204xx.hofstra.edu ssh -L ${port}:${node}:${port} ${user}@<login-node>
Replace 'xx' with a number between 01-30 to select a Linux lab machine.
"
module load jupyter
jupyter notebook --no-browser --port=${port} --ip=${node}
```
Replace `/home/username/outputs/` with your actual directory path for storing output and error files.
The script uses these SLURM settings:
- `--nodelist`: Picks which compute node to use
- `--ntasks=1`: Runs one instance of Jupyter
- `--cpus-per-task=1`: Uses one CPU
- `--time=00:30:00`: Runs for up to 30 minutes
First, let's take a look at what the new directives and commands in this script do:
To get started:
1. Submit the job: `sbatch jupyter.sbatch`
2. Look in your output file (`jupyter_<jobid>.out`) for the SSH tunnel command
3. Run that command from your local machine, replacing the `xx` placeholder with a number between 01-30
4. Find the Jupyter URL with token in your error file (`jupyter_<jobid>.err`)
5. Open that URL in your local computer's browser
Note that most of the directives at the start of this script have previously been discussed at "Basic batch job example", so we are only going to discuss the new ones:
Remember: The compute nodes can't access the internet, so transfer any files you need through the login node first.
- `--nodelist=cn01`: Using `--nodelist` you can specify the exact name(s) of the node(s) you want your job to run on. In this case, we have specified it to be `cn01`.
- `--ntasks=1`: This directive tells SLURM to allocate resources for one task. A "task" in this context is essentially an instance of your application or script running on the cluster. For many applications, especially those that don't explicitly parallelize their workload across multiple CPUs or nodes, specifying a single task is sufficient. However, if you're running applications that can benefit from parallel execution, you might increase this number. This directive is crucial for optimizing resource usage based on the specific needs of your job. For instance, running multiple independent instances of a data analysis script on different subsets of your data could be a scenario where increasing the number of tasks is beneficial.
- `--cpus-per-task=1`: This sets the number of CPUs allocated to each task specified by `--ntasks`. By default, setting it to 1 assigns one CPU to your task, which is fine for tasks that are not CPU-intensive or designed to run on a single thread. However, for applications that are multi-threaded and can utilize more than one CPU core for processing, you would increase this value to match the application's capability to parallelize its workload.
- The variable initializations such as `node=...`, `user=...` are used to retrieve some information from the node you are running your job on to produce the right command for you to later run **locally**, and set up the SSH tunnel. You don't need to worry about these.
- The `echo` command is going to write the ssh tunneling command to your `.out` file with the help of the variables. We will explain how to use that generated command further below.
- `module load jupyter`: Loads the required modules to add support for the command `jupyter`.
- `jupyter notebook --no-browser --port=${port} --ip=${node}` runs a jupyter notebook and makes it listen on our specified port and address to later be accessible through your local machine's browser.
## Working on the Same Node
Then, submit your Batch job using `sbatch jupyterTest.sbatch`. Make sure to replace `jupyterTest.sbatch` with whatever file name and extension you choose.
At this stage, if you go and read the content of `jupyterTest.out`, there is a generated command that must look like the following:
Need to run commands on the node where Jupyter is running? Use `srun` to get an interactive shell:
```bash
ssh -p5010 -N -f -L 9001:cn01:9001 <your-username>@binary.star.hofstra.edu
srun --jobid=<your_jupyter_job_id> --pty bash
```
Copy that line and run it in your local machine's command line. Then, enter your login credentials for `binary` and hit enter. You should not expect anything magical to happen. In fact, if everything is successful, your shell would go to a new line without generating any output.
You can now access Jupyter's GUI through a browser of your choice on your local machine, at the address that jupyter notebook has generated for you. For some reason, Jupyter writes the address to `stderr`, so you must look for it inside your `jupyterTest.err` file. Inside that file, there must be a line containing a link similar to the following:
Check out [Interactive jobs]({{site.baseurl}}{% link jobs/submitting-jobs.md %}#interactive-jobs) for more details about interactive sessions.
```bash
http://127.0.0.1:9001/?token=...(your token is here)...
```
## Adding More Packages
Copy that address and paste it into your browser, and you must successfully access Jupyter's GUI.
### Container Images
You can also use Docker images through Apptainer (formerly Singularity). This is great when you want an environment with everything pre-installed. Check out the [Apptainer Guide]({{site.baseurl}}{% link software/apptainer.md %}) to learn more.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment