Commit 1d64f878 authored by Mani Tofigh's avatar Mani Tofigh

1) Deleted Stallo's env module page and added our own completed version under…

1) Deleted Stallo's env module page and added our own completed version under the Software section. 2) Deleted Guides folder and moved the Slurm doc under the Jobs setion (Will further have to incorporate both versions)
parent 891dc118
......@@ -3,3 +3,4 @@ _site
.sass-cache
Gemfile.lock
.DS_Store
......@@ -3,3 +3,5 @@ source "https://rubygems.org"
gem "jekyll-rtd-theme", "~> 2.0.10"
gem "github-pages", group: :jekyll_plugins
gem "webrick"
---
sort: 3
sort: 2
---
# Account
......
---
sort: 10
---
# Environment Modules
To compile and run jobs on a cluster, a special shell environment is typically set up for the software that is used. However, setting up the right environment for a particular software package and version can be tricky, and it can be hard to keep track of how it was set up.
For example, users want to have a clean way to bring up the right environment for compiling code according to the various MPI implementations, but it can easily get confusing about which libraries have been used or are needed, and one can end up with multiple libraries with similar names installed in a disorganized manner.
One might also like to conveniently test new versions of a software package before permanently installing the package.
On a Linux system without special utilities, setting up environments can be complex. To simplify this task, HPC clusters often make use of the environment modules package, which provides the `module` command. The `module` command is a handy utility that makes taking care of the shell environment much easier.
## Available Commands
Practical use of the modules commands is given in the following sections.
For reference, the help text for the module command can be viewed as follows:
```
[me@cluster ~]$ module --help
Modules Release 4.5.3 (2020-08-31)
Usage: module [options] [command] [args ...]
Loading / Unloading commands:
add | load modulefile [...] Load modulefile(s)
rm | unload modulefile [...] Remove modulefile(s)
purge Unload all loaded modulefiles
reload | refresh Unload then load all loaded modulefiles
switch | swap [mod1] mod2 Unload mod1 and load mod2
Listing / Searching commands:
list [-t|-l|-j] List loaded modules
avail [-d|-L] [-t|-l|-j] [-S|-C] [--indepth|--no-indepth] [mod ...]
List all or matching available modules
aliases List all module aliases
whatis [-j] [modulefile ...] Print whatis information of modulefile(s)
apropos | keyword | search [-j] str
Search all name and whatis containing str
is-loaded [modulefile ...] Test if any of the modulefile(s) are loaded
is-avail modulefile [...] Is any of the modulefile(s) available
info-loaded modulefile Get full name of matching loaded module(s)
Collection of modules handling commands:
save [collection|file] Save current module list to collection
restore [collection|file] Restore module list from collection or file
saverm [collection] Remove saved collection
saveshow [collection|file] Display information about collection
savelist [-t|-l|-j] List all saved collections
is-saved [collection ...] Test if any of the collection(s) exists
Shell's initialization files handling commands:
initlist List all modules loaded from init file
initadd modulefile [...] Add modulefile to shell init file
initrm modulefile [...] Remove modulefile from shell init file
initprepend modulefile [...] Add to beginning of list in init file
initswitch mod1 mod2 Switch mod1 with mod2 from init file
initclear Clear all modulefiles from init file
Environment direct handling commands:
prepend-path [-d c] var val [...] Prepend value to environment variable
append-path [-d c] var val [...] Append value to environment variable
remove-path [-d c] var val [...] Remove value from environment variable
Other commands:
help [modulefile ...] Print this or modulefile(s) help info
display | show modulefile [...] Display information about modulefile(s)
test [modulefile ...] Test modulefile(s)
use [-a|-p] dir [...] Add dir(s) to MODULEPATH variable
unuse dir [...] Remove dir(s) from MODULEPATH variable
is-used [dir ...] Is any of the dir(s) enabled in MODULEPATH
path modulefile Print modulefile path
paths modulefile Print path of matching available modules
clear [-f] Reset Modules-specific runtime information
source scriptfile [...] Execute scriptfile(s)
config [--dump-state|name [val]] Display or set Modules configuration
Switches:
-t | --terse Display output in terse format
-l | --long Display output in long format
-j | --json Display output in JSON format
-d | --default Only show default versions available
-L | --latest Only show latest versions available
-S | --starts-with
Search modules whose name begins with query string
-C | --contains Search modules whose name contains query string
-i | --icase Case insensitive match
-a | --append Append directory to MODULEPATH
-p | --prepend Prepend directory to MODULEPATH
--auto Enable automated module handling mode
--no-auto Disable automated module handling mode
-f | --force By-pass dependency consistency or confirmation dialog
Options:
-h | --help This usage info
-V | --version Module version
-D | --debug Enable debug messages
-v | --verbose Enable verbose messages
-s | --silent Turn off error, warning and informational messages
--paginate Pipe mesg output into a pager if stream attached to terminal
--no-pager Do not pipe message output into a pager
--color[=WHEN] Colorize the output; WHEN can be 'always' (default if
omitted), 'auto' or 'never'
[me@cluster ~]$
```
## Managing Environment Modules
There is a good chance the cluster administrator has set up the user’s account, `fred` for example, so that some modules are loaded already by default. In that case, the modules loaded into the user’s environment can be seen with the `module list` command:
```
[fred@cluster ~]# module list
Currently Loaded Modulefiles:
1) shared 2) slurm/slurm/21.08.8 3) gcc/11.2.0
```
If there are no modules loaded by default, then the `module list` command just returns nothing.
How does one know what modules are available? The `module avail` command lists all modules that are available for loading:
```
[fred@cluster ~]$ module avail
---------------------------- /cm/local/modulefiles -----------------------------
apptainer/1.0.2 cmsh module-git
boost/1.77.0 cuda-dcgm/3.1.3.1 module-info
cluster-tools/9.2 dot null
cm-bios-tools freeipmi/1.6.8 openldap
cm-image/9.2 gcc/11.2.0 openmpi/mlnx/gcc/64/4.1.5a1
cm-scale/cm-scale.module ipmitool/1.8.18 python3
cm-setup/9.2 lua/5.4.4 python39
cmd luajit shared
cmjob mariadb-libs slurm/slurm/21.08.8
---------------------------- /cm/shared/modulefiles ----------------------------
blacs/openmpi/gcc/64/1.1patch03 hwloc/1.11.11
blas/gcc/64/3.10.0 hwloc2/2.7.1
bonnie++/2.00a intel-tbb-oss/ia32/2021.4.0
cm-pmix3/3.1.4 intel-tbb-oss/intel64/2021.4.0
cm-pmix4/4.1.1 iozone/3_492
cuda11.8/blas/11.8.0 lapack/gcc/64/3.10.0
cuda11.8/fft/11.8.0 mpich/ge/gcc/64/3.4.2
cuda11.8/toolkit/11.8.0 mvapich2/gcc/64/2.3.7
default-environment netcdf/gcc/64/gcc/64/4.8.1
fftw3/openmpi/gcc/64/3.3.10 netperf/2.7.0
gdb/10.2 openblas/dynamic/(default)
git/2.33.1 openblas/dynamic/0.3.18
globalarrays/openmpi/gcc/64/5.8 openmpi/gcc/64/4.1.2
hdf5/1.12.1 openmpi4/gcc/4.1.2
hdf5_18/1.8.21 ucx/1.10.1
```
In the list there are two kinds of modules:
* **local modules**, which are specific to the node, or head node only
* **shared modules**, which are made available from a shared storage, and which only become available for loading after the `shared` module is loaded.
Modules can be loaded using the `add` or `load` options. A list of modules can be added by spacing them:
```
[fred@cluster ~]$ module add shared gcc openmpi/gcc
```
Tab completion works for suggesting modules for the add/load commands. If the tab completion suggestion is unique, even though it is not the full path, then it is still enough to specify the module. For example, looking at the possibile available modules listed by the avail command previously, it turns out that specifying `gcc` is enough to specify `gcc/11.2.0` because there is no other directory path under `gcc/` besides `11.2.0` anyway.
To remove one or more modules, the `module unload` or `module rm` command is used.
To remove all modules from the user’s environment, the `module purge` command is used.
Users should be aware that some loaded modules can conflict with others loaded at the same time. This can happen with MPI modules. For example, loading `openmpi/gcc` without removing an already loaded `intel/mpi/64` can result in conflicts about which compiler should be used.
### The `shared` Module
The `shared` module provides access to shared libraries. By default these are under `/cm/shared`.
The `shared` module is special because often other modules, as seen under `/cm/shared/modulefiles`, depend on it. So, if it is to be loaded, then it is usually loaded first, so that the dependent modules can use it.
The shared module is obviously a useful local module, and is therefore often configured to be loaded for the user by default. Setting the default environment modules is discussed in the next section.
## Changing The Default Environment Modules
If a user has to manually load up the same modules every time upon login it would be inefficient. That is why an initial default state for modules can be set up by the user, by using the module `init*` subcommands:
The more useful ones of these are:
* `module initadd`: add a module to the initial state
* `module initrm`: remove a module from the initial state
* `module initlist`: list all modules loaded initially
* `module initclear`: clear all modules from the list of modules loaded initially
Example:
```
[fred@cluster ~]$ module initclear
[fred@cluster ~]$ module initlist
bash initialization file $HOME/.bashrc loads modules:
[fred@cluster ~]$ module initadd shared gcc openmpi/gcc
[fred@cluster ~]$ module initlist
bash initialization file $HOME/.bashrc loads modules:
shared gcc openmpi/gcc
```
In the preceding example, the modules defined for the new initial environment for the user are loaded from the next login onward.
Example:
```
[fred@cluster ~]$ module list
No Modulefiles Currently Loaded.
[fred@cluster ~]$ exit
logout
Connection to bright92 closed
[root@basejumper ~]# ssh fred@cluster
fred@cluster's password:
...
[fred@cluster ~]$ module list
Currently Loaded Modulefiles:
1) shared 2) gcc/9.2.0 3) openmpi/gcc/64/1.10.7
[fred@cluster ~]$
```
If you are unsure about what the module does, it can be checked using `module whatis`:
```
$ module whatis openmpi/gcc
----------------------------------- /cm/shared/modulefiles ------------------------------------
openmpi/gcc/64/1.10.7: adds OpenMPI to your environment variables
```
The man pages for module and modulefile give further details on usage.
---
sort: 100
---
# Guides
{% include list.liquid all=true %}
# I'm folder2
source: `{{ page.path }}`
# file1
source: `{{ page.path }}`
# file2
source: `{{ page.path }}`
# file3
source: `{{ page.path }}`
# I'm folder1
source: `{{ page.path }}`
# file1
source: `{{ page.path }}`
# file2
source: `{{ page.path }}`
# file3
source: `{{ page.path }}`
---
sort: 2
sort: 1
---
# Getting Help
......
---
sort: 3
---
# Slurm
Slurm Workload Manager, or SLURM (Simple Linux Utility for Resource Management), is a free and open-source job scheduler for managing workloads on Linux and Unix-based clusters, grids, and supercomputers. Slurm is widely used in high-performance computing (HPC) environments, where it is used to manage the allocation of resources such as CPU time, memory, and storage across a large number of compute nodes. Slurm provides tools for users to submit, monitor, and control the execution of their jobs. Other key features include support for parallel and serial job execution, support for job dependencies and job arrays, support for resource reservations and QoS (quality of service), and support for job priority and backfilling. Slurm has a modular design that enables it to be highly configurable to be tailored to meet a wide variety of needs in different environments. It is widely used in academia and industry, and is supported by a large and active community of users and developers.
# Environment modules
## Introduction to Environment Modules
In an HPC cluster, the diversity and quantity of installed software span many applications in various versions. Often, these applications are installed in non-standard locations for ease of maintenance, practicality, and security reasons. Due to the shared nature of the HPC cluster and its significant scale compared to typical desktop compute machinery, it's neither possible nor desirable to use all these different software versions simultaneously, as conflicts between them may arise. To manage this complexity, we provide the production environment for each application outside of the application itself, through a set of instructions and variable settings known as an application module. This approach not only prevents conflicts but also simplifies control over which application versions are available for use in any given session. We utilize the `lmod` module system for this purpose, with the `module` command being the primary tool for managing these software environments.
For example, if a user needs to work with a specific Python environment provided by Anaconda, they can simply load the Anaconda module by executing `module load anaconda`.
This is just one instance of how the module command can be utilized. In the following sections, we will fully discuss the module command and its use cases.
For a complete list of options with the `module` command:
```bash
[me@cluster ~]$ module --help
```
## Loading and Managing Modules
### Checking Loaded Modules
To see the modules currently active in your session:
```bash
module list
```
### Listing Available Modules
To view all available modules:
```bash
module avail
```
The list will include both local modules (specific to the node or head node) and shared modules (available from shared storage).
### Loading a Module
To load a module, for example, `gcc`:
```bash
module load gcc
```
To load a specific version of a module:
```bash
module load gcc/11.2.0
```
### Unloading Modules
To unload a module:
```bash
module unload gcc
```
### Switching Module Versions
To switch to a different version of a module:
```bash
module switch intel intel/2016b
```
### Avoiding Module Conflicts
Be aware of potential conflicts, especially with MPI modules. Loading conflicting modules like `openmpi` and `impi` simultaneously should be avoided.
Using the `shared` Module
The `shared` module provides access to shared libraries and is often a dependency for other modules. It is typically loaded first:
```bash
module load shared
```
### Setting Default Modules
To avoid manually loading the same modules every login, users can set an initial default state for modules using `module init*` subcommands:
* **Add a module to initial state**: `module initadd <module_name>`
* **Remove a module from initial state**: `module initrm <module_name>`
* **List initial modules**: `module initlist`
* **Clear initial modules**: `module initclear`
Example:
```bash
module initclear
module initadd shared gcc openmpi/gcc
```
### Available Commands and Practical Usage
For practical use of the modules commands:
* **Loading and unloading modules**: `module load <module_name>, module unload <module_name>`
* **Listing loaded and available modules**: `module list, module avail`
* **Switching modules**: `module switch <old_module> <new_module>`
* **Finding out what a module does**: `module whatis <module_name>`
Example of loading modules:
```bash
[fred@cluster ~]$ module load shared gcc openmpi/gcc
```
> Tab completion is available for suggesting modules for the add/load commands.
Example of unloading modules:
```bash
[fred@cluster ~]$ module unload gcc openmpi/gcc
```
### Managing the Default Environment
Users can customize their default environment using module `init*` commands. This ensures the desired modules are automatically loaded at login.
Example:
```bash
[fred@cluster ~]$ module initclear
[fred@cluster ~]$ module initadd shared gcc openmpi/gcc
[fred@cluster ~]$ module initlist
```
## Additional Information
* **Conflicts and Dependencies**: Users should be mindful of conflicts between modules and dependencies, particularly with MPI implementations.
* **Testing New Software Versions**: Modules allow for easy testing of new software versions without permanent installation.
For further details, users are encouraged to refer to the man pages for module and modulefile:
```bash
man module
```
\ No newline at end of file
---
sort: 1
---
# Star cluster
{% include list.liquid all=true %}
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment