Storage content added: 1. storage/file_transfer.md 2. storage/storage.md

891dc118 · Mani Tofigh · c8f2c96f · 891dc118 · 891dc118
Commit 891dc118 authored Jan 06, 2024 by Mani Tofigh
Hide whitespace changes
Inline Side-by-side

Showing with 68 additions and 238 deletions

file_transfer.md storage/file_transfer.md +26 -43

storage.md storage/storage.md +42 -195

No files found.
--- a/storage/file_transfer.md
+++ b/storage/file_transfer.md
-# Transferring files to/from Star
+# File Transfer to/from Star

-Only ssh type of access is open to star. Therefore to upload or
-download data only scp and sftp can be used.
+Star supports file transfers primarily through SCP and SFTP, both of which operate over SSH.

-To transfer data to and from star use the following address:
+## File Transfer Address

-    star.uit.no
+Currently, there are no dedicated nodes for file transfers at Star. All transfers should be conducted over the login node using the standard server address.

-This address has nodes with 10Gb network interfaces.
+## Basic Tools (SCP, SFTP)

-Basic tools (scp, sftp)
+Standard SCP and SFTP clients can be used for secure file transfers. Here are the basic commands for using these tools:

-Standard scp command and sftp clients can be used:
+    ssh star.hofstra.edu
+    ssh -l <username> star.hofstra.edu

-    ssh star.uit.no
-    ssh -l <username> star.uit.no
+    sftp star.hofstra.edu
+    sftp <username>@star.hofstra.edu

-    sftp star.uit.no
-    sftp <username>@star.uit.no
+## Mounting the File System on Your Local Machine Using SSHFS

-## Mounting the file system on you local machine using sshfs
+Star HPC Cluster allows users to mount remote file systems on their local machines. For Linux, the command would look like this:

-For linux users:
+    sshfs [user@]star.hofstra.edu:[dir] mountpoint [options]

-    sshfs [user@]star.uit.no:[dir] mountpoint [options]
+For example:

-eg.:
+    sshfs yourusername@star.hofstra.edu: /home/yourusername/star-fs/

-    sshfs steinar@star.uit.no:  /home/steinar/star-fs/
+Windows and Mac users can use Cyberduck for similar functionality. WinSCP is another option for Windows, and FileZilla can be used across Windows, Mac, and Linux.

-Windows users may buy and install
-[expandrive](https://www.expandrive.com/windows).
+### High-Performance Tools

-### High-performance tools
+For large data transfers, the performance can vary greatly depending on the source's location and bandwidth. Hofstra does not have unlimited Internet bandwidth, so transfers from external sources might be slower. For high-performance transfers, users are encouraged to use utilities like rsync, which is supported and recommended for its efficiency.

-## NONE cipher
+## Subversion and Rsync

-This cipher has the highest transfer rate. Keep in mind that data after
-authentication is NOT encrypted, therefore the files can be sniffed and
-collected unencrypted by an attacker. To use you add the following to
-the client command line:
+Rsync is particularly useful and recommended for transferring files to and from the Star HPC Cluster. It provides an efficient way to sync files and directories across different locations while minimizing data transfer.

-    -oNoneSwitch=yes -oNoneEnabled=yes
+## Guidelines for Large File Transfers

-Anytime the None cipher is used a warning will be printed on the screen:
+When transferring very large files or datasets, it is advised to use rsync and to calculate and confirm checksums to ensure data integrity.

-    "WARNING: NONE CIPHER ENABLED"
+## Network Interfaces and Bandwidth

-If you do not see this warning then the NONE cipher is not in use.
+All file transfer access to the Star HPC Cluster is currently through the login node's 1GbE interface. Users should be aware of potential bandwidth limitations, especially when transferring large amounts of data.

-MT-AES-CTR
+## User Authentication and Permissions

-If for some reason (eg: high confidentiality) NONE cipher can't be used,
-the multithreaded AES-CTR cipher can be used, add the following to the
-client command line (choose one of the numbers):
-
-    -oCipher=aes[128|192|256]-ctr
-
-or:
-
-    -caes[128|192|256]-ctr.
-
-## Subversion and rsync
-
-The tools subversion and rsync is also available for transferring files.
+File transfers are authenticated in the same way as SSH access. SSH keys are the preferred method for secure authentication, although password authentication is currently allowed. Plans for implementing Multi-Factor Authentication (MFA) are being considered for future security enhancements.
--- a/storage/storage.md
+++ b/storage/storage.md
-# Storage and backup
+# Storage and Backup

-## Available file system
+## Available File Systems

-Star has a "three folded" file system:
+Star HPC Cluster offers a number of file systems for different storage needs:

-   Global accessible home area (user area): /home (64 TB)
-   Global accessible work or scratch area: /global/work (1000 TB)
-   Local accessible work or scratch area on each node: /local/work
-    (\~450 GB)
+- **Home Directories**: Located on the head node, accessible from all nodes via NFS. These directories are not designed for high-performance needs and have a limited capacity.
+- **Work/Scratch (Local Storage)**: Directly attached storage specific to each compute node with high-performance local storage.
+- **Data (HPE PFSS)**: A high-capacity, high-performance shared storage system.

-## Home area
+## Home Directories

-The file system for user home directories on Star. It is a 64 TB
-global file system, which is accessible from both the login nodes and
-all the compute nodes. The size of the home directory's for each user is
-300 GB. It is not possible to extend the size. If you need more space,
-consider using /global/work (see below).
+The home directories on Star serves as personal storage spaces for users. They are globally accessible from both the login nodes and all compute nodes. The size of these directories will be limited to a few gigabytes per user (exact quota to be determined). Users are advised to use these directories for storing essential files and scripts, and not for large datasets or computationally intensive tasks.

-The home area is for "permanent" storage only, so please do not use it
-for temporary storage during production runs. Jobs using the home area
-for scratch files while running may be killed without any warning.
+## Work/Scratch Areas

-## Work/scratch areas
+The Star HPC Cluster provides two types of work/scratch areas:

-<div class="warning">
+- **Work/Scratch (Local Storage)**: This high-performance local storage is directly attached to each compute node. It offers several terabytes of capacity without any imposed quota. It is ideal for temporary data storage during computations.
+- **Data (HPE PFSS)**: This shared storage solution has a high capacity of 64 terabytes and offers high performance. The quota for this system is a few terabytes per user which is suitable for larger datasets and critical research data.

-<div class="title">
+Users are encouraged to manage their data efficiently, using the home directories for persistent but small-scale storage needs and the work/scratch spaces for temporary data.

-Warning
+## Backup and Recovery

-</div>
+### Backup Policies

-Due to star coming close to its storage limits, starting from July
-2020 /global/work will be subject to a auto cleanup affecting all files
-older than 21 days.
+The Star HPC Cluster's backup policies vary across different storage systems:

-Starting from first of July we will move all files which haven’t been
-accessed for more than 21 days to a trash folder from where they will be
-deleted in due time.
+- **Home Directories**: Backed up daily with a retention period of approximately two weeks, subject to capacity.
+- **Work/Scratch**: No backup services provided.
+- **Data (HPE PFSS)**: Backup may be available on a per-project basis as determined by specific project requirements.

-We ask you to classify your data in /global/work and move all files you
-need to keep to your home folder or other storage options like NIRD. In
-order to save storage space consider archiving your data in compressed
-form, see `file_compression`.
+Users can request backup services as needed, particularly for critical data stored in the Data (HPE PFSS) system.

-In case you miss important files that have been moved, please write an
-email to <migration@metacenter.no> as we keep the files for some time
-and can restore them if needed.
-
-</div>
-
-There are two different work/scratch areas available on Star:
-
-   1000 TB global accessible work area on the cluster, accessible from
-    both the login nodes and all the compute nodes as /global/work. This
-    is the recommended work area, both because of size and performance!
-    Users can stripe files themselves as this file system is a Lustre
-    file system.
-   In addition, each compute node has a small work area of
-    approximately 450 GB, only locally accessible on each node. This
-    area is accessible as /local/work on each compute node. In general
-    we do not recommend to use /local/work, both because of (the lack
-    of) size and performance, however for some users this may be the
-    best alternative.
-
-These work areas should be used for all jobs running on Star.
-
-There is no backup of files stored on the work areas. If you need
-permanent storage of large amounts of data, please contact the system
-administrators: <support@metacenter.no>
+## Compression of data

-Disk quota is not enforced on work/scratch areas. Please use common
-courtesy and keep your work/scratch partitions clean. Move all files you
-do not need on Star elsewhere or delete them. Since overfilled
-work/scratch partitions can cause problems, files older than 14 days are
-subject for deletion without any notice.
+Data which is not accessed frequently like results of finished projects
+should be compressed in order to reduce storage space.

-Files on /local/work/ belonging to users other than the one that runs a
-job on the node will be deleted.
+We recommend `xz` and `tar` to compress single files or whole folder
+structures. To compress a single file:

-## Backup
+    $ xz file

-There is no real backup of the data on Star. However we do keep daily
-snapshots of /home and /project for the last 7 days. The /home snapshots
-are kept at /global/hds/.snapshot/
+To decompress:

-There is no backup of files stored on the /global/work and /local/work
-areas. If you need permanent storage of large amounts of data, or if you
-need to restore some lost data, please contact the system
-administrators: <support@metacenter.no>
+    $ xz --decompress file

-## Archiving data
+To create a archive multiple files or folder:

-Archiving is not provided. However you may apply for archive space on
-[Norstore](https://archive.norstore.no/).
+    $ tar cfJv archive.tar.xz files

-## Closing of user account
+It is recommended to use the file suffix `.tar.xz` to make it clear that
+archive was compressed with `xz`.

-User accounts on Star are closed on request from Uninett Sigma or the
-project leader. The account is closed in a way so that the user no
-longer can log in to Star.
+To extract a archive (use `-C folder` to extract the files in folder):

-If the user has data needed by other people in the group all data on
-/home/ is preserved.
+    $ tar xvf archive.tar.xz

-## Privacy of user data
+## Data Archiving

-General privacy
+Data archiving services are available on the Star HPC Cluster to comply with NSF requirements for published research data. Specific policies and procedures for data archiving are currently under development and will align with NSF regulations and user demands.

-There is a couple of things you as a user, can do to minimize the risk
-of your data and account on Star being read/accessed from the outside
-world.
+## Closing of User Account

-1.  Your account on Star is personal, do not give away your password
-    to anyone, not even the HPC staff.
-2.  If you have configured ssh-keys on your local computer, do not use
-    passphrase-less keys for accessing Star.
+Upon account closure, users are notified to transfer any essential data they wish to retain. Data stored in the user's spaces will be eventually deleted following account termination.

-By default a new account on Star is readable for everyone on the
-system. That is both /home/ and /global/work/
+## Privacy and Security of User Data

-This can easily be change by the user using the command chmod The chmod
-have a lot "cryptic" combinations of options ([click here for a more in
-depth explanation](https://en.wikipedia.org/wiki/Chmod) ). the most
-commonly used is:
+The Star HPC Cluster maintains strict policies regarding the privacy and security of user data. Users are responsible for ensuring the confidentiality of their data and are advised not to share their account credentials. The default permissions for new accounts allow user data to be readable by others on the system. Users can easily change these permissions using the `chmod` command to suit their privacy needs. The most commonly used is:

 -   only user can read their home directory:

@@ -143,104 +88,6 @@ commonly used is:

        chmod 777 /home/$USER

-## Management of lage files (\> 200GB)
-
-Some special care needs to be taken if you want to create very large
-files on the system. With large we mean file sizes over 200GB.
-
-The /global/work file system (and /global/home too) is served by a
-number of storage arrays that each contain smaller pieces of the file
-system, the size of the chunks are 2TB (2000GB) each. In the default
-setup each file is contained within one storage array so the default
-filesize limit is thus 2TB. In practice the file limit is considerably
-smaller as each array contains a lot of files.
-
-Each user can change the default placement of the files it creates by
-striping files over several storage arrays. This is done with the
-following command:
-
-    lfs setstripe -c 4 .
-
-After this has been done all new files created in the current directory
-will be spread over 4 storage arrays each having 1/4th of the file. The
-file can be accessed as normal no special action need to be taken. When
-the striping is set this way it will be defined on a per directory basis
-so different directories can have different stripe setups in the same
-file system, new subdirectories will inherit the striping from its
-parent at the time of creation.
-
-We recommend users to set the stripe count so that each chunk will be
-approx. 200-300GB each, for example
-
-| File size  | Stripe count | Command                |
-|------------|--------------|------------------------|
-| 500-1000GB | 4            | `lfs setstripe -c 4 .` |
-| 1TB - 2TB  | 8            | `lfs setstripe -c 8 .` |
-
-Once a file is created the stripe count cannot be changed. This is
-because the physical bits of the data already are written to a certain
-subset of the storage arrays. However the following trick can used after
-one has changed the striping as described above:
-
-    $ mv file file.bu
-    $ cp -a file.bu file
-    $ rm file.bu
-
-The use of `-a` flag ensures that all permissions etc are preserved.
-
-## Management of many small files (\> 10000)
-
-The file system on Star is designed to give good performance for large
-files. This have some impact if you have many small files.
-
-If you have thousands of files in one directory. Basic operations like
-'ls' becomes very slow, there is nothing to do about this. However
-directories containing many files may cause the backup of the data to
-fail. It is therefore highly recommended that if you want backup of the
-files you need to use 'tar' to create on archive file of the directory.
-
-## Compression of data
-
-Data which is not accessed frequently like results of finished projects
-should be compressed in order to reduce storage space.
-
-We recommend `xz` and `tar` to compress single files or whole folder
-structures. To compress a single file:
-
-    $ xz file
-
-To decompress:
-
-    $ xz --decompress file
-
-To create a archive multiple files or folder:
-
-    $ tar cfJv archive.tar.xz files
-
-It is recommended to use the file suffix `.tar.xz` to make it clear that
-archive was compressed with `xz`.
-
-To extract a archive (use `-C folder` to extract the files in folder):
-
-    $ tar xvf archive.tar.xz
-
-## Binary data and endianness
-
-Star is like all desktop PCs a little endian computer.
-
-At the moment in NOTUR the only big endian machine is njord.hpc.ntnu.no
-so Fortran sequential unformatted files create on Njord cannot be read
-on Star.
-
-The best work around for this is to save your file in a portable file
-format like [netCDF](https://www.unidata.ucar.edu/software/netcdf/) or
-[HDF5](https://www.hdfgroup.org/).
-
-Both formats are supported on star, but you have to load its modules
-to use them:
-
-    $ module load netCDF
-
-Or:
+## Support

-    $ module load HDF5
+For assistance with storage and backup issues or any other inquiries, users can contact the Star HPC Cluster support team at Starhpcsupport@hofstra.edu.