Commit e0708af5 authored by Mani Tofigh's avatar Mani Tofigh

Finalized about-star.md and added hardware info table. Added some links and made…

Finalized about-star.md and added hardware info table. Added some links and made some minor changes to some sentences in quickstart.md
parent 3baf8cc8
...@@ -10,12 +10,11 @@ Batch jobs allow users to execute tasks without direct interaction with the comp ...@@ -10,12 +10,11 @@ Batch jobs allow users to execute tasks without direct interaction with the comp
BATCH directives are essentially instructions embedded at the beginning of a batch job script and are interpreted by the scheduler (like Slurm in our case). These lines are prefixed with `#SBATCH` for Slurm and inform the scheduler about the resources needed for the job and any other execution preferences. BATCH directives are essentially instructions embedded at the beginning of a batch job script and are interpreted by the scheduler (like Slurm in our case). These lines are prefixed with `#SBATCH` for Slurm and inform the scheduler about the resources needed for the job and any other execution preferences.
Here are a list of common directives: <br> Here are a list of common directives: <br>
* `#SBATCH --nodes=1`: Requests a specific number of nodes. * `#SBATCH --nodes=<some-value>`: Requests a specific number of nodes to run your job on.
* `#SBATCH --ntasks-per-node=1`: Defines the number of tasks to run on each node. * `#SBATCH --mem=<some-value>`: Specifies the amount of RAM required.
* `#SBATCH --mem=4G`: Specifies the amount of memory required. * `#SBATCH --time=<some-value>`: Sets the maximum runtime.
* `#SBATCH --time=01:00:00`: Sets the maximum runtime (hh:mm:ss). * `#SBATCH --output=<some-value>`: Directs the job's output to a specific file.
* `#SBATCH --partition=standard`: Specifies the queue/partition where the job should be submitted.
* `#SBATCH --output=result.txt`: Directs the job's output to a specific file. <br><br>
**Note:** These bullets are just for a better basic understanding on the topic. Complete examples and line-by-line explanations are provided further down on this page. **Note:** These bullets are just for a better basic understanding on the topic. Complete examples and line-by-line explanations are provided further down on this page.
### Queues and partitions ### Queues and partitions
...@@ -28,8 +27,11 @@ Our cluster is partitioned into the following categories: <br> ...@@ -28,8 +27,11 @@ Our cluster is partitioned into the following categories: <br>
Choosing the right partition ensures your job is queued in an environment suited to its needs, and can potentially reduce wait times. Choosing the right partition ensures your job is queued in an environment suited to its needs, and can potentially reduce wait times.
### Simple batch job example ### Basic batch job example
Let's walk through a basic `batch` job script: In this example we are going to run a Python program with specified resource limits through our batch script. <br>
Create two files, one containing your `.sbatch` script and the other containing your `.py` program: <br>
I'm going to name them `my_script.sbatch` and `my_script.py`. <br>
Then add the following to `my_script.sbatch`:
```bash ```bash
#!/bin/bash #!/bin/bash
#SBATCH --job-name=test_job #SBATCH --job-name=test_job
...@@ -43,13 +45,24 @@ module load python3 ...@@ -43,13 +45,24 @@ module load python3
python3 my_script.py python3 my_script.py
``` ```
And add a simple `print` statement to `my_script.py`:
```python
print("Hello World!")
```
**NOTE:** In this example it's assumed that both your batch and Python script files are in the same directory. If that is not the case, please make sure you are providing the full path (starting from the root directory `/`) to your Python file, inside of your script. For example:
```bash
python3 /path/to/python_script/my_script.py
```
Now let's walk through `my_script.sbatch` line by line to see what each directive does.
* `#!/bin/bash`: This line needs to be included at the start of **all** your batch scripts. It basically specifies the script to be run with a shell called `bash`. * `#!/bin/bash`: This line needs to be included at the start of **all** your batch scripts. It basically specifies the script to be run with a shell called `bash`.
Lines 2-7 are your `SBTACH` directives. These lines are where you specify different options for your job including its name, output and error files path/name, list of nodes you want to use, runtime and resource limits, and more if required. Let's walk through them line by line: Lines 2-7 are your `SBTACH` directives. These lines are where you specify different options for your job including its name, output and error files path/name, list of nodes you want to use, resource limits, and more if required. Let's walk through them line by line:
* `#SBATCH --job-name=test_job`: This directive gives your job a name that you can later use to easier track and manage your job when looking for it in the queue. In this example, we've called it `test_job`. You can read about job management at `/software/env-modules.html`. * `#SBATCH --job-name=test_job`: This directive gives your job a name that you can later use to easier track and manage your job when looking for it in the queue. In this example, we've called it `test_job`. You can read about job management at `/software/env-modules.html`.
* `#SBATCH --output=test_job.out`: Used to specify where your output file is generated, and what it's going to be named. In this example, we have not provided a path, but only provided a name. When you use the `--output` directive without specifying a full path, just providing a filename, Slurm will store the output file in the current working directory from which the `sbatch` command was executed. * `#SBATCH --output=test_job.out`: Used to specify where your output file is generated, and what it's going to be named. In this example, we have not provided a path, but only provided a name. When you use the `--output` directive without specifying a full path, just providing a filename, Slurm will store the output file in the current working directory from which the `sbatch` command was executed.
* `#SBATCH --error=test_job.err`: Functions similar to `--output` except it contains error messages generated during the execution of your job, if any. **The `.err` file is always going to be generated even if your job execution is successful; however, it's going to be empty if there are no errors.** * `#SBATCH --error=test_job.err`: Functions similar to `--output` except it contains error messages generated during the execution of your job, if any. **The `.err` file is always going to be generated even if your job execution is successful; however, it's going to be empty if there are no errors.**
* `#SBATCH --nodes=1`: This directive basically tells the scheduler "Run my job on any available node you find, and I don't care which one". **It's also possible to specify the name of the node(s) you'd like to use which we will cover in future examples.** * `#SBATCH --nodes=1`: Specifies your job to run on one available node. This directive basically tells the scheduler "Run my job on any available node you find, and I don't care which one". **It's also possible to specify the name of the node(s) you'd like to use which we will cover in future examples.**
* `#SBATCH --time=10:00`: This line specifies how long you want your job to run, after it's out the queue and starts execution. In this case, the job will be **terminated** after 10 minutes. Acceptable time formats include `mm`, `mm:ss`, `hh:mm:ss`, `days-hh`, `days-hh:mm` and `days-hh:mm:ss`. * `#SBATCH --time=10:00`: This line specifies how long you want your job to run, after it's out the queue and starts execution. In this case, the job will be **terminated** after 10 minutes. Acceptable time formats include `mm`, `mm:ss`, `hh:mm:ss`, `days-hh`, `days-hh:mm` and `days-hh:mm:ss`.
* `#SBATCH --mem=1G` Specifies the maximum main memory required *per* node. In this case we set the cap to 1 GigaByte. If you don't use a memory unit, Slurm automatically uses MegaBytes: `#SBATCH --mem=4096` requests 4096MB of RAM. **If you want to request all the memory on a node, you can use `--mem=0`.** * `#SBATCH --mem=1G` Specifies the maximum main memory required *per* node. In this case we set the cap to 1 GigaByte. If you don't use a memory unit, Slurm automatically uses MegaBytes: `#SBATCH --mem=4096` requests 4096MB of RAM. **If you want to request all the memory on a node, you can use `--mem=0`.**
......
...@@ -21,11 +21,34 @@ The cluster also supports various software applications tailored to different ne ...@@ -21,11 +21,34 @@ The cluster also supports various software applications tailored to different ne
### Compute Nodes ### Compute Nodes
* Two Apollo 6500 Gen10+ HPE nodes, *each* containing 8 NVIDIA A100 SXM GPUs.
* One HPE ProLiant DL385 Gen10+ v2, containing 2 A30 SXM NVIDIA GPUs.
#### HPE Apollo 6500 Gen10 #### HPE Apollo 6500 Gen10
| Attribute\Cluster Name | gpu1 | gpu2 |
|------------------------|----------------------------------|----------------------------------|
| Model Name | HPE ProLiant XL675d Gen10 Plus; Apollo 6500 Gen10 Plus Chassis | HPE ProLiant XL675d Gen10 Plus; Apollo 6500 Gen10 Plus Chassis |
| Sockets | 2 | 2 |
| Cores per Socket | 32 | 32 |
| Threads per Core | 2 | 2 |
| Memory | 1024 GiB Total Memory (16 x 64GiB DIMM DDR4) | 1024 GiB Total Memory (16 x 64GiB DIMM DDR4) |
| Local Storage (Scratch space) | 407GB | 407GB |
#### HPE DL365 Gen10 #### HPE DL365 Gen10
| Attribute\Cluster Name | cn01 |
|------------------------|-------------------------------------------|
| Model Name | HPE ProLiant DL385 Gen10 Plus v2 |
| Sockets | 2 |
| Cores per Socket | 32 |
| Threads per Core | 2 |
| Memory | 256GiB Total Memory (16 x 16GiB DIMM DDR4)|
| Local Storage | 854G |
### Storage System ### Storage System
Our storage system contains of four HPE PFSS nodes, collectively offering a total of 63TB of storage. You can think of these four nodes as one unified 63TB storage unit as it is a **Parallel File System Storage** component. These nodes work in parallel and are all mounted under **one** mount point on the gpu nodes only (`/fs1`).
## Our vision ## Our vision
...@@ -33,7 +56,7 @@ Making complex and time-intensive calculations simple and accessible. ...@@ -33,7 +56,7 @@ Making complex and time-intensive calculations simple and accessible.
### Our Goal ### Our Goal
Our heart is set on creating a vibrant community where our High-Performance Computing (HPC) cluster is a beacon of collaboration and discovery. We are wishing to provide a supportive space where researchers and students can express their scientific ideas and explore unchanted areas. We are here to make the complicated world of computational research a shared path of growth, learning, and significant discoveries for the ones that are eager to learn. Our heart is set on creating a community where our cluster is a symbol of collaboration and discovery. We are wishing to provide a supportive space where researchers and students can express their scientific ideas and explore unchanted areas. We aim to make the complicated world of computational research a shared path of growth, learning, and significant discoveries for the ones that are eager to learn.
## Operations Team ## Operations Team
......
...@@ -6,7 +6,7 @@ sort: 2 ...@@ -6,7 +6,7 @@ sort: 2
## Account Access ## Account Access
A Star HPC account is required to access and submit jobs to the Star HPC cluster. If you do not have one or need help with your account, please contact your cluster administrator. A Star HPC account is required to access and submit jobs to the Star HPC cluster.
The application process may require justification of the need for HPC resources, detailing the kind of work you intend to do, the resources you expect to use, and sometimes, the anticipated outcomes of your research. The application process may require justification of the need for HPC resources, detailing the kind of work you intend to do, the resources you expect to use, and sometimes, the anticipated outcomes of your research.
...@@ -28,7 +28,7 @@ Members of Hofstra University, Nassau Community College, or Adelphi University, ...@@ -28,7 +28,7 @@ Members of Hofstra University, Nassau Community College, or Adelphi University,
### Requesting an account ### Requesting an account
To get an account on Star, you need to complete out the registration form. There, you will need to provide us the following information: To get an account on Star, you need to complete out the registration form at [Star Account Management Web Application](http://localhost:3000). There, you will need to provide us the following information:
- Your full name, date of birth, and nationality. - Your full name, date of birth, and nationality.
- Your position (master student, PhD, PostDoc, staff member, - Your position (master student, PhD, PostDoc, staff member,
...@@ -65,7 +65,7 @@ Submit the above information through the online registration form. ...@@ -65,7 +65,7 @@ Submit the above information through the online registration form.
## Login node ## Login node
Access to the cluster is provided through SSH to the login node. The login node serves as the gateway or entry point to the cluster. It is important to understand that the login node is not for running computationally intensive tasks itself. Instead, it is for tasks such as file management, editing, and job submission. The actual computational work is done on the compute nodes, which you access indirectly by submitting jobs through Slurm, the job scheduling system. Access to the cluster is provided through SSH [(What is SSH?)](https://www.youtube.com/watch?v=qWKK_PNHnnA&ab_channel=Tinkernut) to the login node. The login node serves as the gateway or entry point to the cluster. It is important to understand that the login node is not for running computationally intensive tasks itself. Instead, it is for tasks such as file management, editing, and job submission. The actual computational work is done on the compute nodes, which you access indirectly by submitting jobs through Slurm, the job scheduling system.
## Scheduler policies ## Scheduler policies
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment