Commit 42ff63fc authored by Alexander Rosenberg's avatar Alexander Rosenberg

Merge branch 'master' into cyberduck

parents 11c28d45 7b2e2ab0
......@@ -17,12 +17,18 @@ code examples are provided under the [MIT](https://opensource.org/licenses/MIT)
### Install build tools and dependencies.
<details>
<summary>Liquid 4.0.3</summary>
> [!WARNING]
> Due to Liquid not being updated to work with Ruby 3.2.x, **make sure you have Ruby 3.1.x or older installed**.
> If you have the latest dependencies installed, the following does not apply anymore.
>
> The original `jekyll-rtd-theme` 2.0.10 required `github-pages` 209, which effectively capped the version of Liquid to 4.0.3.
> Due to Liquid 4.0.3 and older not being updated to work with Ruby 3.2.x, Ruby 3.1.x or older was required for Liquid 4.0.3.
> https://talk.jekyllrb.com/t/liquid-4-0-3-tainted/7946/18
>
> #### With Cygwin
> As of this writing (8/8/2024), Cygwin provides Ruby versions 2.6.4-1 and 3.2.2-2. Make sure to install the former. Additionally, the version of bundler supplied with Ruby 2.6 is too old and the version of RubyGems is too new. *After installing the following dependencies*, you must then install the correct versions of RubyGems and bundler manually:
> As of 8/8/2024, Cygwin provided Ruby versions 2.6.4-1 and 3.2.2-2. You would need to make sure to install the former. As the version of bundler supplied with Ruby 2.6 is too old and the version of RubyGems is too new, the correct versions of RubyGems and bundler would need to be installed manually after installing all the other dependencies:
> ```
> gem update --system 3.2.3
> gem install bundler -v 2.1.4
......@@ -31,6 +37,8 @@ code examples are provided under the [MIT](https://opensource.org/licenses/MIT)
> bundler -v
> ```
</details>
To allow building of native extensions, install `ruby-devel`, `gcc`, and `make`.
Install `libxml2`, `libxml2-devel`, `libxslt`, `libxslt-devel`, `libiconv`,
......@@ -58,6 +66,10 @@ want to try running `bundle update` or removing `Gemfile.lock` and then running
git clone https://github.com/starhpc/docs.git star-docs
cd star-docs
gem install bundler
bundle config set --local path ~/.bundler # Optionally specify where to install gems (SO Q&A #8257833).
# Otherwise, bundler may attempt to install gems system-wide,
# e.g. /usr/share/gems, depending on your GEM_HOME
# (see SO Q&A #11635042 and #3408868).
bundle install
bundle exec jekyll serve
```
......
......@@ -63,14 +63,37 @@ Yes. Please see `/software/python_r_perl`.
### How can I check my disk quota and disk usage?
repquota prints a summary of the disc usage and quotas for the specified file systems.
To check the disk quota of your home directory ( /home/username ), you can use the repquota command which prints a summary of the disc usage and quotas for the specified file systems.
$ /usr/sbin/repquota -a -s
$ Block limits File limits
$ User used soft hard grace used soft hard grace
$ cchave6 -- 116M 1024M 1280M 1922 0 0
If you want to see the quota on the home directory where the file system is ext4, the quota information is stored in files named aquota.user and aquota.group at the root of filesystem.
Here,
Soft Limit -> This is a warning threshold. A user can exceed this limit temporarily, but they must reduce usage back under this limit within a "grace period."
Hard Limit -> This is the absolute maximum disk space or number of files a user can use. The user cannot exceed this limit at all.
Grace Period -> The amount of time a user is allowed to exceed the soft limit before they are required to get back under it. If this period expires, the soft limit becomes enforced like a hard limit.
File limits (inodes) -> These limit the number of files a user can create, regardless of their size.
To check the quota of the main project storage (parallel file system - /fs1/proj/<project>), you can use this command:
$ mmlsquota -j <fileset_name> <filesystem_name>
The -j option specifies that you are querying a fileset. Filesets in GPFS are similar to directories that can have independent quota limits.
fileset_name -> This is the name of the fileset whose quota you want to check.
filesystem_name -> The name of the GPFS filesystem in which the fileset resides.
example: mmlsquota -j project_fileset gpfs1
### How many CPU hours have I spent?
......
......@@ -9,4 +9,4 @@ Please see the [Quick Start Guide]({{site.baseurl}}{% link quickstart/quickstart
## Getting help
First, please read [how to write good support requests]({{site.baseurl}}{% link help/writing-support-requests.md %}). Then shoot us an email.
First, please read [how to write good support requests]({{site.baseurl}}{% link help/writing-support-requests.md %}). Then [contact us]({{site.baseurl}}{% link help/contact.md %}).
......@@ -8,48 +8,115 @@ sort: 1
The Star HPC Cluster is a computing facility designed for a variety of research and computational tasks. It combines advanced computing **nodes** and a high-speed **storage system** with a suite of **software applications**.
SLURM (Simple Linux Utility for Resource Management) is our chosen job scheduler and queueing system that efficiently manages resource allocation, ensuring everyone gets the right amount of resources at the right time.
SLURM (Simple Linux Utility for Resource Management) is our chosen job scheduler and queueing system that efficiently manages resource allocation, ensuring everyone gets the right amount of resources at the right time.
Apptainer (formerly Singularity) is also a major application on the cluster. Apptainer is a containerization platform similar to Docker with the major difference that it runs under user privileges instead of `root`. This platform is enhanced by NGC (NVIDIA GPU Cloud) which provides access to a wide array of pre-built, GPU-optimized software containers for diverse applications. This integration saves all users a lot of time as they don’t need to set up the software applications from scratch and can just pull and use the NGC images with Apptainer.
The cluster also supports various software applications tailored to different needs: Python and R for data analysis, MATLAB for technical computing, Jupyter for interactive projects, and OpenMPI for parallel computing. Anaconda broadens these capabilities with packages for scientific computing, while NetCDF manages large datasets. For big data tasks, Hadoop/Spark offers powerful processing tools.
## Hardware
### Login Node
### Compute Nodes
* Two Apollo 6500 Gen10+ HPE nodes, *each* containing 8 NVIDIA A100 SXM GPUs.
* One HPE ProLiant DL385 Gen10+ v2, containing 2 A30 SXM NVIDIA GPUs.
- Two Apollo 6500 Gen10+ HPE nodes, _each_ containing 8 NVIDIA A100 SXM GPUs.
- One HPE ProLiant DL385 Gen10+ v2, containing 2 A30 SXM NVIDIA GPUs.
- Two XL675d Gen10+ servers (Apollo 6500 Gen10+ chassis), _each_ containing 8 NVIDIA A100 SXM4 GPUs.
- One HPE DL385 Gen10+ v2 with 2 A30 PCIe GPUs.
- Two HPE DL380a Gen11 servers, _each_ containing 2 NVIDIA H100 80GB GPUs.
- Two Cray XD665 nodes, _each_ containing 4 NVIDIA HGX H100 80GB GPUs.
- One Cray XD670 node, containing 8 NVIDIA HGX H100 80GB GPUs.
#### HPE Apollo 6500 Gen10
| Attribute\Node Name | gpu1 | gpu2 |
|------------------------|----------------------------------|----------------------------------|
| Model Name | HPE ProLiant XL675d Gen10 Plus; Apollo 6500 Gen10 Plus Chassis | HPE ProLiant XL675d Gen10 Plus; Apollo 6500 Gen10 Plus Chassis |
| Sockets | 2 | 2 |
| Cores per Socket | 32 | 32 |
| Threads per Core | 2 | 2 |
| Memory | 1024 GiB Total Memory (16 x 64GiB DIMM DDR4) | 1024 GiB Total Memory (16 x 64GiB DIMM DDR4) |
| GPU | 8 SXM NVIDIA A100s | 8 SXM NVIDIA A100s |
| Local Storage (Scratch space) | 407GB | 407GB |
| Attribute\Node Name | gpu1 | gpu2 |
| ----------------------------- | -------------------------------------------------------------- | -------------------------------------------------------------- |
| Model Name | HPE ProLiant XL675d Gen10 Plus; Apollo 6500 Gen10 Plus Chassis | HPE ProLiant XL675d Gen10 Plus; Apollo 6500 Gen10 Plus Chassis |
| Sockets | 2 | 2 |
| Cores per Socket | 32 | 32 |
| Threads per Core | 2 | 2 |
| Memory | 1024 GiB Total Memory (16 x 64GiB DIMM DDR4) | 1024 GiB Total Memory (16 x 64GiB DIMM DDR4) |
| GPU | 8 SXM NVIDIA A100s | 8 SXM NVIDIA A100s |
| Local Storage (Scratch space) | 407GB | 407GB |
#### HPE DL385 Gen10
| Attribute\Node Name | cn01 |
|------------------------|-------------------------------------------|
| Model Name | HPE ProLiant DL385 Gen10 Plus v2 |
| Sockets | 2 |
| Cores per Socket | 32 |
| Threads per Core | 2 |
| Memory | 256GiB Total Memory (16 x 16GiB DIMM DDR4)|
| GPU | 2 SXM NVIDIA A30s |
| Local Storage (Scratch Space) | 854G |
| Attribute\Node Name | cn01 |
| ----------------------------- | ------------------------------------------ |
| Model Name | HPE ProLiant DL385 Gen10 Plus v2 |
| Sockets | 2 |
| Cores per Socket | 32 |
| Threads per Core | 2 |
| Memory | 256GiB Total Memory (16 x 16GiB DIMM DDR4) |
| GPU | 2 SXM NVIDIA A30s |
| Local Storage (Scratch Space) | 854G |
#### XL675d Gen10+ (Apollo 6500 Chassis)
| Attribute\Node Name | gpu4 | gpu5 |
| ----------------------------- | -------------------------------------- | -------------------------------------- |
| Model Name | HPE ProLiant XL675d Gen10 Plus Chassis | HPE ProLiant XL675d Gen10 Plus Chassis |
| Sockets | 2 (AMD EPYC 7513 @ 2.60 GHz) | 2 (AMD EPYC 7513 @ 2.60 GHz) |
| Cores per Socket | 64 Physical Cores | 64 Physical Cores |
| Threads per Core | 2 (128 Logical Cores) | 2 (128 Logical Cores) |
| Memory | 1024 GiB DDR4 3200 RAM | 1024 GiB DDR4 3200 RAM |
| GPU | 8 NVIDIA A100 80GB SXM4 GPUs | 8 NVIDIA A100 80GB SXM4 GPUs |
| Local Storage (Scratch Space) | 2x 480GB SSD | 2x 480GB SSD |
#### HPE DL385 Gen10+ v2
| Attribute\Node Name | cn02 |
| ----------------------------- | -------------------------------- |
| Model Name | HPE ProLiant DL385 Gen10 Plus v2 |
| Sockets | 2 (AMD EPYC 7513 @ 2.60 GHz) |
| Cores per Socket | 64 Physical Cores |
| Threads per Core | 2 (128 Logical Cores) |
| Memory | 256GiB DDR4 RAM |
| GPU | 2 NVIDIA A30 24GB HBM2 PCIe GPUs |
| Local Storage (Scratch Space) | 854G |
#### HPE DL380a Gen11
| Attribute\Node Name | gpu6 | gpu7 |
| ----------------------------- | -------------------------------------------- | -------------------------------------------- |
| Model Name | HPE DL380a Gen11 | HPE DL380a Gen11 |
| Sockets | 2 (Intel Xeon-P 8462Y+ @ 2.8GHz) | 2 (Intel Xeon-P 8462Y+ @ 2.8GHz) |
| Cores per Socket | 64 | 64 |
| Threads per Core | 2 (128 Logical Cores) | 2 (128 Logical Cores) |
| Memory | 512 GiB DDR5 RAM | 512 GiB DDR5 RAM |
| GPU | 2 NVIDIA H100 80GB GPUs (NVAIE subscription) | 2 NVIDIA H100 80GB GPUs (NVAIE subscription) |
| Network | 4-port GbE, 1-port HDR200 InfiniBand | 4-port GbE, 1-port HDR200 InfiniBand |
| Local Storage (Scratch Space) | 1TB SSD | 1TB SSD |
#### Cray XD665 Nodes
| Attribute\Node Name | cray01 | cray02 |
| ----------------------------- | -------------------------------------- | -------------------------------------- |
| Model Name | Cray XD665 | Cray XD665 |
| Sockets | 2 (AMD EPYC Genoa 9334 @ 2.7GHz) | 2 (AMD EPYC Genoa 9334 @ 2.7GHz) |
| Cores per Socket | 64 | 64 |
| Threads per Core | 2 (128 Logical Cores) | 2 (128 Logical Cores) |
| Memory | 768 GiB DDR5 RAM | 768 GiB DDR5 RAM |
| GPU | 4 NVIDIA HGX H100 80GB SXM GPUs | 4 NVIDIA HGX H100 80GB SXM GPUs |
| Network | 2-port 10GbE, 1-port HDR200 InfiniBand | 2-port 10GbE, 1-port HDR200 InfiniBand |
| Local Storage (Scratch Space) | 1TB SSD | 1TB SSD |
#### Cray XD670 Node
| Attribute\Node Name | cray03 |
| ----------------------------- | -------------------------------------- |
| Model Name | Cray XD670 |
| Sockets | 2 (Intel Xeon-P 8462Y+ @ 2.8GHz) |
| Cores per Socket | 64 Physical Cores |
| Threads per Core | 2 (128 Logical Cores) |
| Memory | 2048 GiB DDR5 RAM |
| GPU | 8 NVIDIA HGX H100 80GB SXM GPUs |
| Network | 2-port 10GbE, 1-port HDR200 InfiniBand |
| Local Storage (Scratch Space) | 2TB SSD |
### Storage System
Our storage system contains of four HPE PFSS nodes, collectively offering a total of 63TB of storage. You can think of these four nodes as one unified 63TB storage unit as it is a **Parallel File System Storage** component. These nodes work in parallel and are all mounted under **one** mount point on the gpu nodes only (`/fs1`).
## Our vision
......@@ -60,19 +127,17 @@ Making complex and time-intensive calculations simple and accessible.
Our heart is set on creating a community where our cluster is a symbol of collaboration and discovery. We are wishing to provide a supportive space where researchers and students can express their scientific ideas and explore unchanted areas. We aim to make the complicated world of computational research a shared path of growth, learning, and significant discoveries for the ones that are eager to learn.
## Operations Team
* Alexander Rosenberg
* Mani Tofigh
- Alexander Rosenberg
- Mani Tofigh
## The Board
* Edward H. Currie
* Daniel P. Miller
* Adam C. Durst
* Jason D. Williams
* Thomas G. Re
* Oren Segal
* John Ortega
- Edward H. Currie
- Daniel P. Miller
- Adam C. Durst
- Jason D. Williams
- Thomas G. Re
- Oren Segal
- John Ortega
......@@ -87,5 +87,5 @@ Project-specific directories may be created upon request for shared storage amon
To make proper use of the cluster, please familiarize yourself with the basics of using Slurm, fundamental HPC concepts, and the cluster's architecture.
You may be familiar with the `.bashrc`, `.bash_profile`, or `.cshrc` files for environment customization. To support different environments needed for different software packages, environment modules are used. Modules allow you to load and unload various software environments tailored to your computational tasks.
You may be familiar with the `.bashrc`, `.bash_profile`, or `.cshrc` files for environment customization. To support different environments needed for different software packages, [environment modules]({{site.baseurl}}{% link software/env-modules.md %}) are used. Modules allow you to load and unload various software environments tailored to your computational tasks.
# Virtual Environment Guide
Managing software dependencies and configurations can be challenging in an HPC environment. Users often need different versions of the same software or libraries, leading to conflicts and complex setups. Virtual environments provide a solution by allowing users to isolate their project and its dependencies. This simplifies the setup process, ensures that users have the correct software environment for their applications, and reduces conflicts and errors caused by incompatible software versions. This guide provides different methods for installing or simulating virtual environments in multiple languages including Python, R, Julia, Rust, C, C++, and other languages. This will allow you to create projects in isolated environments that will not require root or sudo access.
Managing software dependencies and configurations can be challenging in an HPC environment. Users often need different versions of the same software or libraries, leading to conflicts and complex setups. [Environment modules]({{site.baseurl}}{% link software/env-modules.md %}) provide a solution by allowing users to dynamically modify their shell environment using simple commands. This simplifies the setup process, ensures that users have the correct software environment for their applications, and reduces conflicts and errors caused by incompatible software versions. Environment modules work on the same principle as virtual environments, i.e. the manipulation of environment variables. If an environment module is not available for a given version you need, you can instead create a virtual environment using the standard version manager tools provided with many common languages. Virtual environments allow for managing different versions of lanugages and dependencies independent of the system version or other virtual environments, so they are often used by developers to isolate dependencies for different projects.
This guide provides different methods for creating virtual environments and managing dependencies accross multiple languages including Python, R, Julia, Rust, C, C++, and others. This allows you to create projects in isolated environments and install dependencies without the use of root or sudo access.
## Python
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment