Commit c6d99177 authored by Alexander Rosenberg's avatar Alexander Rosenberg

Merge branch 'master' into…

Merge branch 'master' into 5-add-a-question-and-answer-to-the-faq-regarding-root-access-on-the-cluster
parents 3cf6e1e5 599902fd
......@@ -17,12 +17,18 @@ code examples are provided under the [MIT](https://opensource.org/licenses/MIT)
### Install build tools and dependencies.
<details>
<summary>Liquid 4.0.3</summary>
> [!WARNING]
> Due to Liquid not being updated to work with Ruby 3.2.x, **make sure you have Ruby 3.1.x or older installed**.
> If you have the latest dependencies installed, the following does not apply anymore.
>
> The original `jekyll-rtd-theme` 2.0.10 required `github-pages` 209, which effectively capped the version of Liquid to 4.0.3.
> Due to Liquid 4.0.3 and older not being updated to work with Ruby 3.2.x, Ruby 3.1.x or older was required for Liquid 4.0.3.
> https://talk.jekyllrb.com/t/liquid-4-0-3-tainted/7946/18
>
> #### With Cygwin
> As of this writing (8/8/2024), Cygwin provides Ruby versions 2.6.4-1 and 3.2.2-2. Make sure to install the former. Additionally, the version of bundler supplied with Ruby 2.6 is too old and the version of RubyGems is too new. *After installing the following dependencies*, you must then install the correct versions of RubyGems and bundler manually:
> As of 8/8/2024, Cygwin provided Ruby versions 2.6.4-1 and 3.2.2-2. You would need to make sure to install the former. As the version of bundler supplied with Ruby 2.6 is too old and the version of RubyGems is too new, the correct versions of RubyGems and bundler would need to be installed manually after installing all the other dependencies:
> ```
> gem update --system 3.2.3
> gem install bundler -v 2.1.4
......@@ -31,6 +37,8 @@ code examples are provided under the [MIT](https://opensource.org/licenses/MIT)
> bundler -v
> ```
</details>
To allow building of native extensions, install `ruby-devel`, `gcc`, and `make`.
Install `libxml2`, `libxml2-devel`, `libxslt`, `libxslt-devel`, `libiconv`,
......@@ -58,6 +66,10 @@ want to try running `bundle update` or removing `Gemfile.lock` and then running
git clone https://github.com/starhpc/docs.git star-docs
cd star-docs
gem install bundler
bundle config set --local path ~/.bundler # Optionally specify where to install gems (SO Q&A #8257833).
# Otherwise, bundler may attempt to install gems system-wide,
# e.g. /usr/share/gems, depending on your GEM_HOME
# (see SO Q&A #11635042 and #3408868).
bundle install
bundle exec jekyll serve
```
......
......@@ -63,14 +63,37 @@ Yes. Please see `/software/python_r_perl`.
### How can I check my disk quota and disk usage?
repquota prints a summary of the disc usage and quotas for the specified file systems.
To check the disk quota of your home directory ( /home/username ), you can use the repquota command which prints a summary of the disc usage and quotas for the specified file systems.
$ /usr/sbin/repquota -a -s
$ Block limits File limits
$ User used soft hard grace used soft hard grace
$ cchave6 -- 116M 1024M 1280M 1922 0 0
If you want to see the quota on the home directory where the file system is ext4, the quota information is stored in files named aquota.user and aquota.group at the root of filesystem.
Here,
Soft Limit -> This is a warning threshold. A user can exceed this limit temporarily, but they must reduce usage back under this limit within a "grace period."
Hard Limit -> This is the absolute maximum disk space or number of files a user can use. The user cannot exceed this limit at all.
Grace Period -> The amount of time a user is allowed to exceed the soft limit before they are required to get back under it. If this period expires, the soft limit becomes enforced like a hard limit.
File limits (inodes) -> These limit the number of files a user can create, regardless of their size.
To check the quota of the main project storage (parallel file system - /fs1/proj/<project>), you can use this command:
$ mmlsquota -j <fileset_name> <filesystem_name>
The -j option specifies that you are querying a fileset. Filesets in GPFS are similar to directories that can have independent quota limits.
fileset_name -> This is the name of the fileset whose quota you want to check.
filesystem_name -> The name of the GPFS filesystem in which the fileset resides.
example: mmlsquota -j project_fileset gpfs1
### How many CPU hours have I spent?
......
This diff is collapsed.
......@@ -9,4 +9,4 @@ Please see the [Quick Start Guide]({{site.baseurl}}{% link quickstart/quickstart
## Getting help
First, please read [how to write good support requests]({{site.baseurl}}{% link help/writing-support-requests.md %}). Then shoot us an email.
First, please read [how to write good support requests]({{site.baseurl}}{% link help/writing-support-requests.md %}). Then [contact us]({{site.baseurl}}{% link help/contact.md %}).
......@@ -14,20 +14,24 @@ Apptainer (formerly Singularity) is also a major application on the cluster. App
The cluster also supports various software applications tailored to different needs: Python and R for data analysis, MATLAB for technical computing, Jupyter for interactive projects, and OpenMPI for parallel computing. Anaconda broadens these capabilities with packages for scientific computing, while NetCDF manages large datasets. For big data tasks, Hadoop/Spark offers powerful processing tools.
## Hardware
### Login Node
### Compute Nodes
* Two Apollo 6500 Gen10+ HPE nodes, *each* containing 8 NVIDIA A100 SXM GPUs.
* One HPE ProLiant DL385 Gen10+ v2, containing 2 A30 SXM NVIDIA GPUs.
- Two Apollo 6500 Gen10+ HPE nodes, _each_ containing 8 NVIDIA A100 SXM GPUs.
- One HPE ProLiant DL385 Gen10+ v2, containing 2 A30 SXM NVIDIA GPUs.
- Two XL675d Gen10+ servers (Apollo 6500 Gen10+ chassis), _each_ containing 8 NVIDIA A100 SXM4 GPUs.
- One HPE DL385 Gen10+ v2 with 2 A30 PCIe GPUs.
- Two HPE DL380a Gen11 servers, _each_ containing 2 NVIDIA H100 80GB GPUs.
- Two Cray XD665 nodes, _each_ containing 4 NVIDIA HGX H100 80GB GPUs.
- One Cray XD670 node, containing 8 NVIDIA HGX H100 80GB GPUs.
#### HPE Apollo 6500 Gen10
| Attribute\Node Name | gpu1 | gpu2 |
|------------------------|----------------------------------|----------------------------------|
| ----------------------------- | -------------------------------------------------------------- | -------------------------------------------------------------- |
| Model Name | HPE ProLiant XL675d Gen10 Plus; Apollo 6500 Gen10 Plus Chassis | HPE ProLiant XL675d Gen10 Plus; Apollo 6500 Gen10 Plus Chassis |
| Sockets | 2 | 2 |
| Cores per Socket | 32 | 32 |
......@@ -36,20 +40,83 @@ The cluster also supports various software applications tailored to different ne
| GPU | 8 SXM NVIDIA A100s | 8 SXM NVIDIA A100s |
| Local Storage (Scratch space) | 407GB | 407GB |
#### HPE DL385 Gen10
| Attribute\Node Name | cn01 |
|------------------------|-------------------------------------------|
| ----------------------------- | ------------------------------------------ |
| Model Name | HPE ProLiant DL385 Gen10 Plus v2 |
| Sockets | 2 |
| Cores per Socket | 32 |
| Threads per Core | 2 |
| Memory | 256GiB Total Memory (16 x 16GiB DIMM DDR4)|
| Memory | 256GiB Total Memory (16 x 16GiB DIMM DDR4) |
| GPU | 2 SXM NVIDIA A30s |
| Local Storage (Scratch Space) | 854G |
#### XL675d Gen10+ (Apollo 6500 Chassis)
| Attribute\Node Name | gpu4 | gpu5 |
| ----------------------------- | -------------------------------------- | -------------------------------------- |
| Model Name | HPE ProLiant XL675d Gen10 Plus Chassis | HPE ProLiant XL675d Gen10 Plus Chassis |
| Sockets | 2 (AMD EPYC 7513 @ 2.60 GHz) | 2 (AMD EPYC 7513 @ 2.60 GHz) |
| Cores per Socket | 64 Physical Cores | 64 Physical Cores |
| Threads per Core | 2 (128 Logical Cores) | 2 (128 Logical Cores) |
| Memory | 1024 GiB DDR4 3200 RAM | 1024 GiB DDR4 3200 RAM |
| GPU | 8 NVIDIA A100 80GB SXM4 GPUs | 8 NVIDIA A100 80GB SXM4 GPUs |
| Local Storage (Scratch Space) | 2x 480GB SSD | 2x 480GB SSD |
#### HPE DL385 Gen10+ v2
| Attribute\Node Name | cn02 |
| ----------------------------- | -------------------------------- |
| Model Name | HPE ProLiant DL385 Gen10 Plus v2 |
| Sockets | 2 (AMD EPYC 7513 @ 2.60 GHz) |
| Cores per Socket | 64 Physical Cores |
| Threads per Core | 2 (128 Logical Cores) |
| Memory | 256GiB DDR4 RAM |
| GPU | 2 NVIDIA A30 24GB HBM2 PCIe GPUs |
| Local Storage (Scratch Space) | 854G |
#### HPE DL380a Gen11
| Attribute\Node Name | gpu6 | gpu7 |
| ----------------------------- | -------------------------------------------- | -------------------------------------------- |
| Model Name | HPE DL380a Gen11 | HPE DL380a Gen11 |
| Sockets | 2 (Intel Xeon-P 8462Y+ @ 2.8GHz) | 2 (Intel Xeon-P 8462Y+ @ 2.8GHz) |
| Cores per Socket | 64 | 64 |
| Threads per Core | 2 (128 Logical Cores) | 2 (128 Logical Cores) |
| Memory | 512 GiB DDR5 RAM | 512 GiB DDR5 RAM |
| GPU | 2 NVIDIA H100 80GB GPUs (NVAIE subscription) | 2 NVIDIA H100 80GB GPUs (NVAIE subscription) |
| Network | 4-port GbE, 1-port HDR200 InfiniBand | 4-port GbE, 1-port HDR200 InfiniBand |
| Local Storage (Scratch Space) | 1TB SSD | 1TB SSD |
#### Cray XD665 Nodes
| Attribute\Node Name | cray01 | cray02 |
| ----------------------------- | -------------------------------------- | -------------------------------------- |
| Model Name | Cray XD665 | Cray XD665 |
| Sockets | 2 (AMD EPYC Genoa 9334 @ 2.7GHz) | 2 (AMD EPYC Genoa 9334 @ 2.7GHz) |
| Cores per Socket | 64 | 64 |
| Threads per Core | 2 (128 Logical Cores) | 2 (128 Logical Cores) |
| Memory | 768 GiB DDR5 RAM | 768 GiB DDR5 RAM |
| GPU | 4 NVIDIA HGX H100 80GB SXM GPUs | 4 NVIDIA HGX H100 80GB SXM GPUs |
| Network | 2-port 10GbE, 1-port HDR200 InfiniBand | 2-port 10GbE, 1-port HDR200 InfiniBand |
| Local Storage (Scratch Space) | 1TB SSD | 1TB SSD |
#### Cray XD670 Node
| Attribute\Node Name | cray03 |
| ----------------------------- | -------------------------------------- |
| Model Name | Cray XD670 |
| Sockets | 2 (Intel Xeon-P 8462Y+ @ 2.8GHz) |
| Cores per Socket | 64 Physical Cores |
| Threads per Core | 2 (128 Logical Cores) |
| Memory | 2048 GiB DDR5 RAM |
| GPU | 8 NVIDIA HGX H100 80GB SXM GPUs |
| Network | 2-port 10GbE, 1-port HDR200 InfiniBand |
| Local Storage (Scratch Space) | 2TB SSD |
### Storage System
Our storage system contains of four HPE PFSS nodes, collectively offering a total of 63TB of storage. You can think of these four nodes as one unified 63TB storage unit as it is a **Parallel File System Storage** component. These nodes work in parallel and are all mounted under **one** mount point on the gpu nodes only (`/fs1`).
## Our vision
......@@ -60,19 +127,17 @@ Making complex and time-intensive calculations simple and accessible.
Our heart is set on creating a community where our cluster is a symbol of collaboration and discovery. We are wishing to provide a supportive space where researchers and students can express their scientific ideas and explore unchanted areas. We aim to make the complicated world of computational research a shared path of growth, learning, and significant discoveries for the ones that are eager to learn.
## Operations Team
* Alexander Rosenberg
* Mani Tofigh
- Alexander Rosenberg
- Mani Tofigh
## The Board
* Edward H. Currie
* Daniel P. Miller
* Adam C. Durst
* Jason D. Williams
* Thomas G. Re
* Oren Segal
* John Ortega
- Edward H. Currie
- Daniel P. Miller
- Adam C. Durst
- Jason D. Williams
- Thomas G. Re
- Oren Segal
- John Ortega
......@@ -87,5 +87,5 @@ Project-specific directories may be created upon request for shared storage amon
To make proper use of the cluster, please familiarize yourself with the basics of using Slurm, fundamental HPC concepts, and the cluster's architecture.
You may be familiar with the `.bashrc`, `.bash_profile`, or `.cshrc` files for environment customization. To support different environments needed for different software packages, environment modules are used. Modules allow you to load and unload various software environments tailored to your computational tasks.
You may be familiar with the `.bashrc`, `.bash_profile`, or `.cshrc` files for environment customization. To support different environments needed for different software packages, [environment modules]({{site.baseurl}}{% link software/env-modules.md %}) are used. Modules allow you to load and unload various software environments tailored to your computational tasks.
This diff is collapsed.
......@@ -48,6 +48,20 @@ Rsync is a particularly useful tool and is recommended for transferring files to
When transferring very large files or datasets, it is advised to use rsync and to calculate and confirm checksums to ensure data integrity.
## Cyberduck
Cyberduck is a file transfer application with an intuitive graphical interface for transfering files to or from a remote machine. Cyberduck is available for both Windows and Mac. Download it from [cyberduck.io](https://cyberduck.io/).
Click "Open Connection" and a new window will be displayed like below. Select "SFTP (SSH File Transfer Protocol)" from the top dropdown menu. Enter the server, port number, your username, and Linux Lab password. Then click "Connect".
![3-connection.png]({{ site.baseurl }}/images/cyberduck_setup_images/3-connection.png "3-connection.png")
If you see a window asking about an "Unknown fingerprint", click "Always" and then "Allow".
![4-fingerprint.png]({{ site.baseurl }}/images/cyberduck_setup_images/4-fingerprint.png "4-fingerprint.png")
You should now be able to see your user's home directory on the cluster. You can transfer files to and from it by dragging and dropping files between this window and your "Finder" windows.
## Network Interfaces and Bandwidth
All file transfer access to the Star HPC Cluster is currently through the login node's 1GbE interface. Users should be aware of potential bandwidth limitations, especially when transferring large amounts of data.
......@@ -55,3 +69,4 @@ All file transfer access to the Star HPC Cluster is currently through the login
## User Authentication and Permissions
File transfers are authenticated in the same way as SSH access. SSH keys are the preferred method for secure authentication, although password authentication is currently allowed. Plans for implementing Multi-Factor Authentication (MFA) are being considered for future security enhancements.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment