Commit 891dc118 authored by Mani Tofigh's avatar Mani Tofigh

Storage content added: 1. storage/file_transfer.md 2. storage/storage.md

parent c8f2c96f
# Transferring files to/from Star
# File Transfer to/from Star
Only ssh type of access is open to star. Therefore to upload or
download data only scp and sftp can be used.
Star supports file transfers primarily through SCP and SFTP, both of which operate over SSH.
To transfer data to and from star use the following address:
## File Transfer Address
star.uit.no
Currently, there are no dedicated nodes for file transfers at Star. All transfers should be conducted over the login node using the standard server address.
This address has nodes with 10Gb network interfaces.
## Basic Tools (SCP, SFTP)
Basic tools (scp, sftp)
Standard SCP and SFTP clients can be used for secure file transfers. Here are the basic commands for using these tools:
Standard scp command and sftp clients can be used:
ssh star.hofstra.edu
ssh -l <username> star.hofstra.edu
ssh star.uit.no
ssh -l <username> star.uit.no
sftp star.hofstra.edu
sftp <username>@star.hofstra.edu
sftp star.uit.no
sftp <username>@star.uit.no
## Mounting the File System on Your Local Machine Using SSHFS
## Mounting the file system on you local machine using sshfs
Star HPC Cluster allows users to mount remote file systems on their local machines. For Linux, the command would look like this:
For linux users:
sshfs [user@]star.hofstra.edu:[dir] mountpoint [options]
sshfs [user@]star.uit.no:[dir] mountpoint [options]
For example:
eg.:
sshfs yourusername@star.hofstra.edu: /home/yourusername/star-fs/
sshfs steinar@star.uit.no: /home/steinar/star-fs/
Windows and Mac users can use Cyberduck for similar functionality. WinSCP is another option for Windows, and FileZilla can be used across Windows, Mac, and Linux.
Windows users may buy and install
[expandrive](https://www.expandrive.com/windows).
### High-Performance Tools
### High-performance tools
For large data transfers, the performance can vary greatly depending on the source's location and bandwidth. Hofstra does not have unlimited Internet bandwidth, so transfers from external sources might be slower. For high-performance transfers, users are encouraged to use utilities like rsync, which is supported and recommended for its efficiency.
## NONE cipher
## Subversion and Rsync
This cipher has the highest transfer rate. Keep in mind that data after
authentication is NOT encrypted, therefore the files can be sniffed and
collected unencrypted by an attacker. To use you add the following to
the client command line:
Rsync is particularly useful and recommended for transferring files to and from the Star HPC Cluster. It provides an efficient way to sync files and directories across different locations while minimizing data transfer.
-oNoneSwitch=yes -oNoneEnabled=yes
## Guidelines for Large File Transfers
Anytime the None cipher is used a warning will be printed on the screen:
When transferring very large files or datasets, it is advised to use rsync and to calculate and confirm checksums to ensure data integrity.
"WARNING: NONE CIPHER ENABLED"
## Network Interfaces and Bandwidth
If you do not see this warning then the NONE cipher is not in use.
All file transfer access to the Star HPC Cluster is currently through the login node's 1GbE interface. Users should be aware of potential bandwidth limitations, especially when transferring large amounts of data.
MT-AES-CTR
## User Authentication and Permissions
If for some reason (eg: high confidentiality) NONE cipher can't be used,
the multithreaded AES-CTR cipher can be used, add the following to the
client command line (choose one of the numbers):
-oCipher=aes[128|192|256]-ctr
or:
-caes[128|192|256]-ctr.
## Subversion and rsync
The tools subversion and rsync is also available for transferring files.
File transfers are authenticated in the same way as SSH access. SSH keys are the preferred method for secure authentication, although password authentication is currently allowed. Plans for implementing Multi-Factor Authentication (MFA) are being considered for future security enhancements.
# Storage and backup
# Storage and Backup
## Available file system
## Available File Systems
Star has a "three folded" file system:
Star HPC Cluster offers a number of file systems for different storage needs:
- Global accessible home area (user area): /home (64 TB)
- Global accessible work or scratch area: /global/work (1000 TB)
- Local accessible work or scratch area on each node: /local/work
(\~450 GB)
- **Home Directories**: Located on the head node, accessible from all nodes via NFS. These directories are not designed for high-performance needs and have a limited capacity.
- **Work/Scratch (Local Storage)**: Directly attached storage specific to each compute node with high-performance local storage.
- **Data (HPE PFSS)**: A high-capacity, high-performance shared storage system.
## Home area
## Home Directories
The file system for user home directories on Star. It is a 64 TB
global file system, which is accessible from both the login nodes and
all the compute nodes. The size of the home directory's for each user is
300 GB. It is not possible to extend the size. If you need more space,
consider using /global/work (see below).
The home directories on Star serves as personal storage spaces for users. They are globally accessible from both the login nodes and all compute nodes. The size of these directories will be limited to a few gigabytes per user (exact quota to be determined). Users are advised to use these directories for storing essential files and scripts, and not for large datasets or computationally intensive tasks.
The home area is for "permanent" storage only, so please do not use it
for temporary storage during production runs. Jobs using the home area
for scratch files while running may be killed without any warning.
## Work/Scratch Areas
## Work/scratch areas
The Star HPC Cluster provides two types of work/scratch areas:
<div class="warning">
- **Work/Scratch (Local Storage)**: This high-performance local storage is directly attached to each compute node. It offers several terabytes of capacity without any imposed quota. It is ideal for temporary data storage during computations.
- **Data (HPE PFSS)**: This shared storage solution has a high capacity of 64 terabytes and offers high performance. The quota for this system is a few terabytes per user which is suitable for larger datasets and critical research data.
<div class="title">
Users are encouraged to manage their data efficiently, using the home directories for persistent but small-scale storage needs and the work/scratch spaces for temporary data.
Warning
## Backup and Recovery
</div>
### Backup Policies
Due to star coming close to its storage limits, starting from July
2020 /global/work will be subject to a auto cleanup affecting all files
older than 21 days.
The Star HPC Cluster's backup policies vary across different storage systems:
Starting from first of July we will move all files which haven’t been
accessed for more than 21 days to a trash folder from where they will be
deleted in due time.
- **Home Directories**: Backed up daily with a retention period of approximately two weeks, subject to capacity.
- **Work/Scratch**: No backup services provided.
- **Data (HPE PFSS)**: Backup may be available on a per-project basis as determined by specific project requirements.
We ask you to classify your data in /global/work and move all files you
need to keep to your home folder or other storage options like NIRD. In
order to save storage space consider archiving your data in compressed
form, see `file_compression`.
Users can request backup services as needed, particularly for critical data stored in the Data (HPE PFSS) system.
In case you miss important files that have been moved, please write an
email to <migration@metacenter.no> as we keep the files for some time
and can restore them if needed.
</div>
There are two different work/scratch areas available on Star:
- 1000 TB global accessible work area on the cluster, accessible from
both the login nodes and all the compute nodes as /global/work. This
is the recommended work area, both because of size and performance!
Users can stripe files themselves as this file system is a Lustre
file system.
- In addition, each compute node has a small work area of
approximately 450 GB, only locally accessible on each node. This
area is accessible as /local/work on each compute node. In general
we do not recommend to use /local/work, both because of (the lack
of) size and performance, however for some users this may be the
best alternative.
These work areas should be used for all jobs running on Star.
There is no backup of files stored on the work areas. If you need
permanent storage of large amounts of data, please contact the system
administrators: <support@metacenter.no>
## Compression of data
Disk quota is not enforced on work/scratch areas. Please use common
courtesy and keep your work/scratch partitions clean. Move all files you
do not need on Star elsewhere or delete them. Since overfilled
work/scratch partitions can cause problems, files older than 14 days are
subject for deletion without any notice.
Data which is not accessed frequently like results of finished projects
should be compressed in order to reduce storage space.
Files on /local/work/ belonging to users other than the one that runs a
job on the node will be deleted.
We recommend `xz` and `tar` to compress single files or whole folder
structures. To compress a single file:
## Backup
$ xz file
There is no real backup of the data on Star. However we do keep daily
snapshots of /home and /project for the last 7 days. The /home snapshots
are kept at /global/hds/.snapshot/
To decompress:
There is no backup of files stored on the /global/work and /local/work
areas. If you need permanent storage of large amounts of data, or if you
need to restore some lost data, please contact the system
administrators: <support@metacenter.no>
$ xz --decompress file
## Archiving data
To create a archive multiple files or folder:
Archiving is not provided. However you may apply for archive space on
[Norstore](https://archive.norstore.no/).
$ tar cfJv archive.tar.xz files
## Closing of user account
It is recommended to use the file suffix `.tar.xz` to make it clear that
archive was compressed with `xz`.
User accounts on Star are closed on request from Uninett Sigma or the
project leader. The account is closed in a way so that the user no
longer can log in to Star.
To extract a archive (use `-C folder` to extract the files in folder):
If the user has data needed by other people in the group all data on
/home/ is preserved.
$ tar xvf archive.tar.xz
## Privacy of user data
## Data Archiving
General privacy
Data archiving services are available on the Star HPC Cluster to comply with NSF requirements for published research data. Specific policies and procedures for data archiving are currently under development and will align with NSF regulations and user demands.
There is a couple of things you as a user, can do to minimize the risk
of your data and account on Star being read/accessed from the outside
world.
## Closing of User Account
1. Your account on Star is personal, do not give away your password
to anyone, not even the HPC staff.
2. If you have configured ssh-keys on your local computer, do not use
passphrase-less keys for accessing Star.
Upon account closure, users are notified to transfer any essential data they wish to retain. Data stored in the user's spaces will be eventually deleted following account termination.
By default a new account on Star is readable for everyone on the
system. That is both /home/ and /global/work/
## Privacy and Security of User Data
This can easily be change by the user using the command chmod The chmod
have a lot "cryptic" combinations of options ([click here for a more in
depth explanation](https://en.wikipedia.org/wiki/Chmod) ). the most
commonly used is:
The Star HPC Cluster maintains strict policies regarding the privacy and security of user data. Users are responsible for ensuring the confidentiality of their data and are advised not to share their account credentials. The default permissions for new accounts allow user data to be readable by others on the system. Users can easily change these permissions using the `chmod` command to suit their privacy needs. The most commonly used is:
- only user can read their home directory:
......@@ -143,104 +88,6 @@ commonly used is:
chmod 777 /home/$USER
## Management of lage files (\> 200GB)
Some special care needs to be taken if you want to create very large
files on the system. With large we mean file sizes over 200GB.
The /global/work file system (and /global/home too) is served by a
number of storage arrays that each contain smaller pieces of the file
system, the size of the chunks are 2TB (2000GB) each. In the default
setup each file is contained within one storage array so the default
filesize limit is thus 2TB. In practice the file limit is considerably
smaller as each array contains a lot of files.
Each user can change the default placement of the files it creates by
striping files over several storage arrays. This is done with the
following command:
lfs setstripe -c 4 .
After this has been done all new files created in the current directory
will be spread over 4 storage arrays each having 1/4th of the file. The
file can be accessed as normal no special action need to be taken. When
the striping is set this way it will be defined on a per directory basis
so different directories can have different stripe setups in the same
file system, new subdirectories will inherit the striping from its
parent at the time of creation.
We recommend users to set the stripe count so that each chunk will be
approx. 200-300GB each, for example
| File size | Stripe count | Command |
|------------|--------------|------------------------|
| 500-1000GB | 4 | `lfs setstripe -c 4 .` |
| 1TB - 2TB | 8 | `lfs setstripe -c 8 .` |
Once a file is created the stripe count cannot be changed. This is
because the physical bits of the data already are written to a certain
subset of the storage arrays. However the following trick can used after
one has changed the striping as described above:
$ mv file file.bu
$ cp -a file.bu file
$ rm file.bu
The use of `-a` flag ensures that all permissions etc are preserved.
## Management of many small files (\> 10000)
The file system on Star is designed to give good performance for large
files. This have some impact if you have many small files.
If you have thousands of files in one directory. Basic operations like
'ls' becomes very slow, there is nothing to do about this. However
directories containing many files may cause the backup of the data to
fail. It is therefore highly recommended that if you want backup of the
files you need to use 'tar' to create on archive file of the directory.
## Compression of data
Data which is not accessed frequently like results of finished projects
should be compressed in order to reduce storage space.
We recommend `xz` and `tar` to compress single files or whole folder
structures. To compress a single file:
$ xz file
To decompress:
$ xz --decompress file
To create a archive multiple files or folder:
$ tar cfJv archive.tar.xz files
It is recommended to use the file suffix `.tar.xz` to make it clear that
archive was compressed with `xz`.
To extract a archive (use `-C folder` to extract the files in folder):
$ tar xvf archive.tar.xz
## Binary data and endianness
Star is like all desktop PCs a little endian computer.
At the moment in NOTUR the only big endian machine is njord.hpc.ntnu.no
so Fortran sequential unformatted files create on Njord cannot be read
on Star.
The best work around for this is to save your file in a portable file
format like [netCDF](https://www.unidata.ucar.edu/software/netcdf/) or
[HDF5](https://www.hdfgroup.org/).
Both formats are supported on star, but you have to load its modules
to use them:
$ module load netCDF
Or:
## Support
$ module load HDF5
For assistance with storage and backup issues or any other inquiries, users can contact the Star HPC Cluster support team at Starhpcsupport@hofstra.edu.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment