deleted: storage/lustre-performance.md and some restructured files from Stallo.

added: about-star.md and the Introduction content. renamed: Star folder to 'Getting Started' and moved quickstart.md under it.

deleted: storage/lustre-performance.md and some restructured files from Stallo.
added: about-star.md and the Introduction content. renamed: Star folder to 'Getting Started' and moved quickstart.md under it.
70a108bf · Mani Tofigh · 56f98ad6 · 70a108bf · 70a108bf · 70a108bf
Commit 70a108bf authored Jan 18, 2024 by Mani Tofigh
8 changed files
--- a/star/README.md
+++ b/star/README.md
-# Star cluster
+# Getting Started

 {% include list.liquid all=true %}
--- a/quickstart/about-star.md
+++ b/quickstart/about-star.md
+# About Star
+
+## Introduction
+
+Star HPC Cluster is a computing facility designed to cater to a wide range of computational and data-intensive research needs. It serves as a powerful tool for scientists, engineers, and researchers, enabling them to tackle complex scientific problems that require substantial computational resources. The cluster combines high-performance computing hardware with a suite of versatile software applications.
+
+At Star we need an effective management and utilization of resources, and we facilita SLURM for this purpose. SLURM (Simple Linux Utility for Resource Management) is a job scheduler that manages the allocation of computational resources. It ensures that users' computational tasks are queued and processed effectively, and everyone gets their fair share of resource at the right time.
+
+Apptainer (formerly known as Singularity) is also used intensively in the cluster. Singularity/Apptainer provides a containerization platform that allows users to create and deploy applications in a consistent, reproducible, and portable manner. If you have used Docker before, Singularity is not going to be a new concept. Singularity is pretty much Docker, except it runs the containers under users’ system privileges, unlike Docker that runs everything under `root`. Furthermore, in conjunction with NGC (NVIDIA GPU Cloud), Singularity/Apptainer enables users to leverage a wide array of pre-built containers. NGC offers a comprehensive catalog of GPU-optimized software containers for deep learning, machine learning, and HPC applications. This integration means that users have access to dozens of ready-to-use containers, eliminating the need to set up these applications from scratch.
+
+The cluster is equipped with a range of software applications, each serving specific purposes. For instance, Python and R are included for data analysis and statistical computing, MATLAB for high-level technical computing, Jupyter for interactive data science and scientific computing, and OpenMPI for efficient parallel computing. Anaconda offers a comprehensive package for scientific computing and data science in Python and R, while NetCDF is vital for the manipulation and storage of large scientific datasets. For handling big data, Hadoop/Spark is also available.
+
+## Hardware
+
+### Login Node
+
+### Compute Nodes
+
+#### HPE Apollo 6500 Gen10
+
+#### HPE DL365 Gen10
+
+### Storage System
+
+## Software
+
+### SLURM
+
+### Singularity/Apptainer
+
+### Python
+
+### Jupyter
+
+### MATLAB
+
+### R
+
+### OpenMPI
+
+### Anaconda
+
+### NetCDF
+
+### Hadoop/Spark
--- a/quickstart.md
+++ b/quickstart.md
---
-sort: 1
---
-
 # Quick Start Guide

 ## Account Access

--- a/star/getting_started.rst
+++ b/star/getting_started.rst
-.. _getting_started:
-
-===============
-Getting started
-===============
-
-Here you will get the basics to work with Stallo. Please study carefully the links at the end of each paragraph to get more detailed information.
-
-Get an account
--------------
-
-If you are associated with UiT The Arctic University of Norway, you may apply locally. :doc:`/account/uitquota`
-
-You can also apply for an account for Stallo or any of the other Norwegian computer clusters at the `Metacenter account application <https:/www.metacenter.no/user/application/form/notur/>`_. This is also possible if you already have a local account. :doc:`/account/account`
-
-Change temporary password
-------------------------
-
-The password you got by SMS  has to be changed on `MAS <https://www.metacenter.no/user/login/?next=/user/password/>`_ within  one week, or else the loginaccount will be closed again - and you need to contact us for reopening.
-You can't use the temporary password for logging in to Stallo.
-
-Connect to Stallo
-----------------
-
-You may connect to Stallo via *SSH* to ``stallo.uit.no``. This means that on Linux and OSX you may directly connect by opening a terminal and writing ``ssh username@stallo.uit.no``. From Windows, you may connect via PuTTY which is available in the Software Center. X-forwarding for graphical applications is possible. There exists also a webinterface to allow easy graphical login. Please see the following link for details to all mentioned methods. :doc:`/account/login`
-
-On nodes and files
------------------
-
-When you login, you will be on a login node. Do *not* run any long-lasting programs here. The login node shall only be used for job preparation (see below) and simple file operations.
-
-You will also be in your home directory ``/home/username``. Here, you have 300 GB at your disposal that will be backed up regularly. For actual work, please use the global work area at ``/global/work/username``. This space is not backed up, but it has a good performance and is 1000 TB in size. Please remove old files regularly. :doc:`/storage/storage`
-
-To move files from your computer to Stallo or vice versa, you may use any tool that works with *ssh*. On Linux and OSX, these are scp, rsync, or similar programs. On Windows, you may use WinSCP. :doc:`/storage/file_transfer`
-
-Run a program
-------------
-
-There are many programs pre-installed. You may get a list of all programs by typing ``module avail``. You can also search within that list. ``module avail blast`` will search for Blast (case-insensitive). When you found your program of choice, you may load it using ``module load BLAST+/2.7.1-intel-2017b-Python-2.7.14``. All program files will now be available, i.e. you can now simply call ``blastp -version`` to run Blast and check the loaded version. You can also compile your own software, if necessary. :doc:`/software/modules`
-
-To eventually run the program, you have to write a job script. In this script, you can define how long the job (i.e. the program) will run and how much memory and compute cores it needs. For the actual computation, you need to learn at least the basics of Linux shell scripting. You can learn some basics here: :doc:`/account/linux`.
-
-When you wrote the job script, you can start it with ``sbatch jobscript.sh``. This will put the script in the queue, where it will wait until an appropriate compute node is available. You can see the status of your job with ``squeue -u username``. :doc:`/jobs/batch` and :doc:`/jobs/examples`
-
-Every job that gets started will be charged to your quota. Your quota is calculated in hours of CPU time and is connected to your specific project. To see the status of your quota account(s), type ``cost`` :doc:`/account/accounting`
-
-Get help
--------
-
-Do you need help with Stallo? Write us an email to support@metacenter.no. You can also request new software (either an update or entirely new software), suggest changes to this documentation, or send us any other suggestions or issues concerning Stallo to that email address. Please also read the rest of this documentation.
-
-Happy researching!
--- a/star/shutdown.rst
+++ b/star/shutdown.rst
-.. _stallo_shutdown:
-
-===============
-Stallo Shutdown
-===============
-
-Stallo is getting old and will be shutdown this year.
-Hardware failures cause more and more nodes to fail due to high age.
-The system will stay in production and continue **service until
-at least 31. December 2020**.
-
-We will help you with finding alternatives to your computational and
-storage needs and with moving your workflows and data to one of our
-other machines like Betzy, Saga and Fram.
-
-If you have questions, special needs or problems, please contact us at
-migration@metacenter.no
-
-
-Alternatives
-============
-
-For an overview of alternatives especially for local projects, please
-have look `this presentation <https://docs.google.com/presentation/d/1tkXTj9L_9grOYKXcaKmP12jF5_5YzIuj6g4y8X78IUw/edit?usp=sharing>`_
-
-Computional resources
---------------------
-  - `Betzy <https://documentation.sigma2.no/hpc_machines/betzy.html>`_
-  - `Saga <https://documentation.sigma2.no/hpc_machines/saga.html>`_
-  - `Fram <https://documentation.sigma2.no/hpc_machines/fram.html>`_
-  - Kubernetes based cloud infrastructure:
-    `NIRD toolkit <https://www.sigma2.no/nird-toolkit>`_
-
-Storage
-------
-  - `NIRD <https://documentation.sigma2.no/files_storage/nird.html>`_
-  - `UiT research storage <https://uit.no/infrastruktur/enhet?p_document_id=668862>`_ for short- to mid-term storage of work in progress datasets
-    (only for UiT researchers)
-  - `UiT Research Data Portal <https://en.uit.no/forskning/art?p_document_id=548687>`_ for publishing final datasets and results
-  - `Overview <re3data.org>`_ over public research data repositories 
--- a/star/star.rst
+++ b/star/star.rst
-.. _about_stallo:
-
-============
-About Stallo
-============
-
-
-Resource description
-====================
-
-Key numbers about the Stallo cluster: compute nodes, node interconnect,
-operating system, and storage configuration.
-
-
-
-+-------------------------+----------------------------------------------+---------------------------------------------+
-|                         | Aggregated                                   | Per node                                    |
-+=========================+==============================================+=============================================+
-| Peak performance        | 312 Teraflop/s                               | 332 Gigaflop/s / 448 Gigaflops/s            |
-+-------------------------+----------------------------------------------+---------------------------------------------+
-|  |                      |  | 304 x  HP BL460 gen8 blade servers        |  | 1 x    HP BL460 gen8 blade servers       |
-|  |  # Nodes             |  | 328 x HP SL230 gen8 servers               |  | 1 x.   HP SL230 gen8 servers             |
-+-------------------------+----------------------------------------------+---------------------------------------------+
-|  |                      |  | 608 / 4864                                |  | 2 / 16                                   |
-|  | # CPU's / # Cores    |  | 656 / 6560                                |  | 2 / 20                                   |
-+-------------------------+----------------------------------------------+---------------------------------------------+
-|  |                      |  | 608 x 2.60 GHz Intel Xeon E5 2670         |  | 2 x 2.60 GHz Intel Xeon E5 2670          |
-|  | Processors           |  | 656 x 2.80 GHz Intel Xeon E5 2680         |  | 2 x 2.80 Ghz Intel Xeon E5 2680          | 
-+-------------------------+----------------------------------------------+---------------------------------------------+
-| Total memory            | 26.2 TB                                      | 32 GB (32 nodes with 128 GB)                |
-+-------------------------+----------------------------------------------+---------------------------------------------+
-| Internal storage        | 155.2 TB                                     | 500 GB (32 nodes with 600GB raid)           |
-+-------------------------+----------------------------------------------+---------------------------------------------+
-| Centralized storage     | 2000 TB                                      | 2000 TB                                     |
-+-------------------------+----------------------------------------------+---------------------------------------------+
-| Interconnect            | Gigabit Ethernet + Infiniband  :sup:`1`      | Gigabit Ethernet + Infiniband  :sup:`1`     |
-+-------------------------+----------------------------------------------+---------------------------------------------+
-
-+-------------------------------------+-----------------------+
-| Compute racks                       | 11                    |
-+-------------------------------------+-----------------------+
-| Infrastructure racks                | 2                     |
-+-------------------------------------+-----------------------+
-| Storage racks                       | 3                     |
-+-------------------------------------+-----------------------+
-
- 
-
-1) All nodes in the cluster are connected with Gigabit Ethernet and
-QDR Infiniband.
-
- 
-.. _linux-cluster:
-
-Stallo - a Linux cluster 
-========================
-
-This is just a quick and brief introduction to the general features of Linux Clusters.
-
-A Linux Cluster - one machine, consisting of many machines
----------------------------------------------------------
-
-On one hand you can look at large Linux Clusters as rather large and powerful supercomputers, but on the other hand you can look at them as just a large bunch of servers and some storage system(s) connected with each other through a (high speed) network. Both of these views are fully correct, and it's therefore important to be aware of the strengths and the limitations of such a system.
-
-Clusters vs. SMP’s
------------------
-
-Until July 2004, most of the supercomputers available to Norwegian HPC users were more or less large Symmetric Multi Processing (SMP's)  systems; like the HP Superdome's  at UiO and UiT, the IBM Regatta at UiB and the SGI Origion and IBM p575 systems at NTNU.
-
-On SMP systems most of the resources (CPU, memory, home disks, work disks, etc) are more or less uniformly accessible for any job running on the system. This is a rather simple picture to understand, it’s nearly as your desktop machine – just more of everything: More users, more CPU’s, more memory, more disks etc.
-
-On a Linux Cluster the picture is quite different. The system consists of several independent compute nodes (servers) connected with each other through some (high speed) network and maybe hooked up on some storage system. So the HW resources (like CPU, memory, disk, etc) in a cluster are in general distributed and only locally accessible at each server.
-
-
-Linux operating system (Rocks): `<http://www.rocksclusters.org/>`_
-==================================================================
-
-Since 2003, the HPC-group at has been one of five international
-development sites for the Linux operating system Rocks. Together with
-people in Singapore, Thailand, Korea and USA, we have developed a tool
-that has won international recognition, such as the price for "Most
-important software innovation  " both in 2004 and 2005 in HPCWire. Now
-Rocks is a de-facto standard for cluster-management in Norwegian
-supercomputing.
-
-Stallo - Sami mythology
-========================
-
-In the folklore of the Sami, a Stallo (also Stallu or Stalo) is a sami wizard.
-"The Sami traditions up North differ a bit from other parts of Norwegian
-traditions. You will find troll and draug and some other creatures as well,
-but the Stallo is purely Sami. He can change into all kinds of beings,;
-animals, human beings, plants, insects, bird – anything. He can also “turn”
-the landscape so you miss your direction or change it so you don’t recognise
-familiar surroundings.  Stallo is very rich and smart, he owns silver and
-reindeers galore, but he always wants more. To get what he wants he tries to
-trick the Samis into traps, and the most popular Sami stories tell how people
-manage to fool Stallo." NB! Don’t mix Stallo with the noaide! He is a real
-wizard whom people still believe in.
--- a/star/uit-guidelines.rst
+++ b/star/uit-guidelines.rst
-.. _guidelines:
-
-
-Guidelines for use of computer equipment at the UiT The Arctic University of Norway
-===================================================================================
-
-
-Definitions
-----------
-
- Explanation of words and expressions used in these guidelines.
-
-users:
-    Every person who has access to and who uses the University's
-    computer equipment. This includes employees students and others who
-    are granted access to the computer equipment.
-user contract:
-    Agreement between users and department(s) at the University who
-    regulate the user's right of access to the computer equipment. The
-    user contract is also a confirmation that these guidelines are
-    accepted by the user.
-data:
-    All information that is on the computer equipment. This includes
-    both the contents of data files and software.
-computer network:
-    Hardware and/or software which makes it possible to establish a
-    connection between two or more computer terminals. This includes
-    both private, local, national and international computer networks
-    which are accessible through the computer equipment.
-breakdown:
-    Disturbances and abnormalities which prevent the user from
-    (stoppage) maximum utilization of the computer equipment.
-computer equipment:
-    This includes hardware, software, data, services and computer
-    network.
-hardware:
-    Mechanical equipment that can be used for data processing.
-private data:
-    Data found in reserved or private areas or that are marked as
-    private. The data in a user's account is to be regarded as private
-    irrespective of the rights attached to the data.
-resources:
-    Resources refers to the computer equipment including time and the
-    capacity available for the persons who are connected to the
-    equipment.
-
-
-Purpose
-------
-
-The purpose of these guidelines is to contribute towards the development of a
-computer environment in which the potential provided by the computer equipment
-can be utilized in the best possible way by the University and by society at
-large. This is to promote education and research and to disseminate knowledge
-about scientific methods and results.
-
-
-Application
-----------
-
-These guidelines apply to the use of the University's computer equipment and
-apply to all users who are granted access to the computer equipment. The
-guidelines are to be part of a user contract and are otherwise to be accessible
-at suitable places such as the terminal room. The use of the computer equipment
-also requires that the user knows any possible supplementary regulations.
-
-
-Good Faith
----------
-
-Never leave any doubt as to your identity and give your full name in addition
-to explaining your connection to the University. A user is always to identify
-him/ herself by name, his/ her own user identity, password or in another
-regular way when using services on the computer network. The goodwill of
-external environments is not to be abused by the user accessing information not
-intended for the user, or by use of the services for purposes other than that
-for which they are intended. Users are to follow the instructions from system
-administrators about the use of computer equipment. Users are also expected to
-familiarize themselves with the user guides, manuals, documentation etc. in
-order to reduce the risk of breakdowns or loss of data or equipment (through
-ignorance). On termination of employment or studies, it is the users
-responsibility to ensure that copies of data owned or used by the University
-are secured on behalf of the University.
-
-
-Data Safety
-----------
-
-Users are obliged to take the necessity measures to prevent the loss of data
-etc. by taking back-up copies, careful storage of media, etc. This can be done
-by ensuring that the systems management take care of it. Your files are in
-principle personal but should be protected so that they cannot be read by
-others. Users are obliged to protect the integrity of their passwords or other
-safety elements known to them, in addition to preventing unauthorized people
-from obtaining access to the computer equipment. Introducing data involves the
-risk of unwanted elements such as viruses. Users are obliged to take measures
-to protect the computer equipment from such things. Users are obliged to report
-circumstances that may have importance for the safety or integrity of the
-equipment to the closest superior or to the person who is responsible for data
-safety.
-
-
-Respect for Other Users Privacy
-------------------------------
-
-Users may not try to find out another persons password, etc., nor try to obtain
-unauthorized access to another persons data. This is true independent of
-whether or not the data is protected. Users are obliged to familiarize
-themselves with the special rules that apply to the storage of personal
-information (on others). If a user wishes to register personal information, the
-user concerned is obliged to ensure that there is permission for this under the
-law for registration of information on persons or rules authorized by the law
-or with acceptance of rights given to the University. In cases where
-registration of such information is not permitted by these rules the user is
-obliged to apply for (and obtain? ) the necessary permission. Users are bound
-by the oaths of secrecy concerning personal relationships of which the user
-acquires knowledge through use of computer equipment, ref. to the definition in
-section 13 second section of the Administration Law, (forvaltningslovens
-section 13 annet ledd).
-
-
-Proper Use
----------
-
-The computer equipment of the University may not be used to advance slander or
-discriminating remarks, nor to distribute pornography or spread secret
-information, or to violate the peace of private life or to incite or take part
-in illegal actions. This apart, users are to restrain from improper
-communication on the network.
-
-The computer equipment is to be used in accordance with the aims of the
-University. This excludes direct commercial use.
-
-
-Awareness of the Purposes for Use of Resources
----------------------------------------------
-
-The computer equipment of the University is to strengthen and support
-professional activity, administration, research and teaching. Users have a
-co-responsibility in making the best possible use of the resources.
-
-
-Rights
------
-
-Data is usually linked to rights which make their use dependent on agreements
-with the holder of the rights. Users commit themselves to respecting other
-people's rights. This applies also when the University makes data accessible.
-The copying of programs in violation of the rights of use and/or license
-agreement is not permitted.
-
-
-Liability
---------
-
-Users themselves are responsible for the use of data which is made accessible
-via the computer equipment. The University disclaims all responsibility for any
-loss that results from errors or defects in computer equipment, including for
-example, errors or defects in data, use of data from accessible databases or
-other data that has been obtained through the computer network etc. The
-University is not responsible for damage or loss suffered by users as a
-consequence of insufficient protection of their own data.
-
-
-Surveillance
------------
-
-The systems manager has the right to seek access to the individual user's
-reserved areas on the equipment for the purpose of ensuring the equipment's'
-proper functioning or to control that the user does not violate or has not
-violated the regulations in these guidelines. It is presupposed that such
-access is only sought when it is of great importance to absolve the University
-from responsibility or bad reputation. If the systems manager seeks such
-access, the user should be warned about it in an appropriate way. Ordinarily
-such a warning should be given in writing and in advance. If the use of a
-workstation, terminal or other end user equipment is under surveillance because
-of operational safety or other considerations, information about this must be
-given in an appropriate way. The systems managers are bound by oaths of secrecy
-with respect to information about the user or the user's activity which they
-obtain in this way, the exception being that circumstances which could
-represent a violation of these guidelines may be reported to superior
-authorities.
-
-
-Sanctions
---------
-
-Breach of these guidelines can lead to the user being denied access to the
-University's data services, in addition to which there are sanctions that the
-University can order, applying other rules. Breach of privacy laws, oaths of
-secrecy etc. can lead to liability or punishment. The usual rules for dismissal
-or (forced) resignation of employees or disciplinary measures against students,
-apply to users who misuse the computer equipment. The reasons for sanctions
-against a user are to be stated, and can be ordered by the person who has
-authority given by the University. Disciplinary measures against students are
-passed by the University Council, ref. section 47 of the University law.
-
-
-Complaints
----------
-
-Complaints about sanctions are to be directed to the person(s) who order
-sanctions. If the complaint is not complied with, it is sent on to the
-University Council for final decision. Complaints about surveillance have the
-same procedure as for sanctions. The procedure for complaints about dismissal
-or resignation of employees are the usual rules for the University, and rules
-otherwise valid in Norwegian society. Decisions about disciplinary measures
-against students cannot be complained about, See § 47 of the University law.
--- a/storage/lustre-performance.md
+++ b/storage/lustre-performance.md
-# Lustre FS performance tips
-
-## How you can adjust Lustre depending on your application IO requirements.
-
-## Introduction
-
-Lustre is a scalable, high performance, parallel I/O file system. More
-information about Lustre can be found on
-[Wikipedia](https://en.wikipedia.org/wiki/Lustre_(file_system)) .
-
-When using additional library for I/O (i.e. MPIIO, HDF5), reading and
-writing can be done in parallel from several nodes into single-shared
-file.
-
-At the time of writing this guide Lustre is available on 2 NOTUR
-machines:
-
-   Hexagon ( /work )
-   Star (/global/work )
-
-All tests were performed on fully loaded machines, mean values of three
-repeats were used.
-
-Lustre terminology and setup:
-
-MDS is the MetaData Server which handles the information about files and
-directories. OSS is a object storage server that store file data on one
-or more object storage targets (OSTs). OST is the Object Storage Target,
-which is responsible for writing or reading the actual data to and from
-disk.
-
-## Striping
-
-Lustre file striping defines the number of OSTs a file is written
-across. At the time of writing hexagon has 2 stripes with 1MB size.
-Star has by default 1 stripe (no striping). You can manage striping
-with the following tools/commands:
-
-> lfs setstripe - a command to change striping parameters. lfs
-> getstripe - a command to get striping information. llapi - a set of C
-> commands to manipulate striping parameters from C programs
-> (llapi_file_create, llapi_file_get_stripe).
-
-Note that the changed striping can only take effect for newly created
-files or files that are copied (not moved) into the directory.
-
-Examples:
-
-lfs setstripe --size 2M "dir" \# will set stripe size for "dir" to 2M.
-
-lfs setstripe --count 12 "dir" \# will set that each file inside "dir"
-will be striped across 12 OSTs.
-
-## Serial IO
-
-This is when a file or set of files are accessed by one process. Below
-are charts representing Hexagon and Star serial IO performance with
-different number of OSTs and different chunk sizes. It is true for both
-machines that to get better IO performance you have to stripe file
-across several OSTs, where for:
-
-Hexagon optimal is using 2-4 OSTs, depending on the stripe size.
-Increasing chunk size is not much affecting hexagon. This can be related
-to the interconnect, where 1MB transfer size is a minimal size to get
-optimal performance.
-
-Star, by using 8 OSTs you will speedup your data IO from default
-25MIB/s to almost 200MIB/s! Maximum performance you will get with bigger
-chunk size and enough number of OSTs (32MB chunk and 32OSTs will give
-you 428MIB/s).
-
-## Parallel IO
-
-Many processes writing into single-shared file. You will need to write
-at offsets or use parallel IO library, like MPIIO, HDF5. On both
-machines the same number of stripes as the number of clients have been
-used (up to the maximum number of OSTs).
-
-The general rule, like “many clients – do stripe” works on both
-machines. Specific: - Hexagon. One to one ratio of clients to OSTs works
-fine (up to the maximum number of OSTs). Increasing chunk size is not
-affecting performance. - Star, to get most out of the file system you
-will have to increase the chunk size to 32MB and when you have up to 96
-clients stripe over as many OSTs as you have clients, when you are over
-96 clients, keep number of OSTs to 96 to avoid contention.
-
-## General Tips
-
-General striping recommendation:
-
-   Many clients and many files: Do NOT stripe.
-   Many clients one file: Do stripe.
-   Some clients and few large files: Do stripe.
-
-In addition:
-
-   Use parallel IO (HDF5, MPIIO), this is the only way to get full
-    advantage of the Lustre filesystem.
-   Open files read-only whenever is possible.
-   Keep small files on the same OST.
-
-It is highly recommended to read I/O Lustre tips from NICS
-(<http://www.nics.tennessee.edu/computing-resources/file-systems/io-lustre-tips>).