Commit c8f2c96f authored by hucscsys's avatar hucscsys

added content from hpc-uit.readthedocs.io

parent 812bc794
source "https://rubygems.org"
gem "jekyll-rtd-theme", "~> 2.0.5"
gem "jekyll-rtd-theme", "~> 2.0.10"
gem "github-pages", group: :jekyll_plugins
# Star HPC Documentation
This area provides guides and resources for using the Star cluster
# Star HPC user documentation
Served via https://docs.starhpc.hofstra.io
## New users
Copyright (c) 2024
Hofstra University
Please see the getting started guide for information on accssing and submitting jobs to the Star cluster.
## Getting help
## License
Contact...
Text is licensed under [CC-BY 4.0](https://creativecommons.org/licenses/by/4.0/),
code examples are provided under the [MIT](https://opensource.org/licenses/MIT) license.
## Locally building the HTML for testing
### Install build tools and dependencies.
Due to Liquid not being updated to work with Ruby 3.2.x, make sure you have Ruby 3.1.x or older installed.
https://talk.jekyllrb.com/t/liquid-4-0-3-tainted/7946/18
To allow building of native extensions, install `ruby-devel`, `gcc`, and `make`.
Install `libxml2`, `libxml2-devel`, `libxslt`, `libxslt-devel`, `libiconv`,
and `libiconv-devel`. These are all dependencies of the `nokogiri` gem, which
is built from source.
Also, at the time of this writing, `libffi-devel` is needed to build ffi:
```
jekyll-rtd-theme was resolved to 2.0.10, which depends on
github-pages was resolved to 209, which depends on
github-pages-health-check was resolved to 1.16.1, which depends on
typhoeus was resolved to 1.4.1, which depends on
ethon was resolved to 0.16.0, which depends on
ffi
```
`bundler` is used for dependency management and installation of gems, based on the
content of `Gemfile`. If you run into any issues with gem depencencies, you may
want to try running `bundle update` or removing `Gemfile.lock` and then running
`bundle install` again.
### Building the site
```
git clone https://github.com/starhpc/docs.git starhpc-docs
cd starhpc-docs
gem install bundler
bundle install
bundle exec jekyll serve
```
Then point your browser to `http://localhost:4000`.
Alternatively, run `bundle exec jekyll build` to build the static site without starting a local web server.
## Getting started with Markdown (kramdown), Jekyll, and GitHub Pages
- https://kramdown.gettalong.org/quickref.html
- https://jekyllrb.com/docs/
- https://docs.github.com/en/pages/setting-up-a-github-pages-site-with-jekyll/testing-your-github-pages-site-locally-with-jekyll
---
sort: 3
---
# Account
{% include list.liquid all=true %}
# Getting an account
## Getting started on the machine (account, quota, password)
Before you start using the Star cluster, please read Hofstra University's [Acceptable Use Guidelines](http://www.hofstra.edu/scs/aug).
A general introduction to the Star HPC cluster, research community, and support group be found at [starhpc.hofstra.io](https://starhpc.hofstra.io).
## How to get an account and a CPU quota on Star
To be able to work on the Star cluster, you must have an account and you must have been granted CPU time on the system. There are two ways to achieve this:
### Hofstra Quota
“Local users” (i.e. users from Hofstra or affiliated with Hofstra in some
way) can apply for an account and a quota from Hofstra’s share of Star. If
you want to apply for such a quota follow the instructions here:
* [How to get a local account on Star](huquota.html)
### Research Council Quota
Regional users (including users from Hofstra) may apply for an account and a
CPU quota from the Research Council's share of Star. If you want to
apply for such a quota please use this [regional quota form](https://www.example.com/application/project/).
.. _accounting:
=============================
CPU-hour quota and accounting
=============================
CPU quota
=========
To use the batch system you have to have a cpu quota, either local or
national. For every job you submit we check that you have sufficient
quota to run it and you will get a warning if you do not have sufficient
cpu-hours to run the job. The job will be submitted to queue, but will
not start until you have enough cpu-hours to run it.
Resource charging
=================
We charge for used resources, both cpu and memory.
The accounting system charges for used processor equivalents (PE)
times used walltime, so if you ask for more than (total memory pr. node)/(total cores pr. node)
- in the example below 32GB/20cores = 1.6 GB.
The concept of PE defines a processor equivalent as the resource unit 1 core and 1 unit of memory.
For a node with 2 GB og memory pr core, i.e. 32 GB of memory and 16 cores - 1 PE equals to 1 core and
2GB of memory. Currently there is no common definition on the memory unit, other than the one specified
above.
The best way to explain the concept of PE is by example:
Assume that you have a node with 20 cpu-cores and 32 GB memory::
if you ask for less than 1.6GB memory per core then PE will equal the cpu count.
if you ask for 3.2GB memory per core then PE will be twice the cpu-count.
if you ask for 32 GB memory then PE=20 as you only can run one cpu per compute node.
The available memory and core settings for Stallo are explained here: :ref:`about_stallo`
Inspecting your quota
=====================
You can use the cost command to check how much cpu-hours are left on
your allocation:
::
[user@stallo-1 ~]$ cost
Id Name Amount Reserved Balance CreditLimit Deposited Available Percentage
--- ------- --------- -------- -------- ----------- --------- --------- ----------
272 nnXXXXk 168836.65 96272.00 72564.65 0.00 290000.00 72564.65 58.22
10 nnYYYYk 4246.04 0.00 4246.04 0.00 150000.00 4246.04 2.83
The column meaning is
Amount:
The number of hours available for new jobs.
Reserved:
The number of hours reserved for currently running jobs.
Balance:
Amount - Reserved.
CreditLimit:
Allocated low-pri quota.
Deposited:
Allocated normal quota
Inspecting historic use
-----------------------
You can view the accounting history of your projects using::
gstatement --hours --summarize -s YYYY-MM-DD -e YYYY-MM-DD -p nnXXXXk
for more detail see::
gstatement --man
.. _login:
=============================
Logging in for the first time
=============================
Log in with SSH
===============
An *SSH* client (Secure SHell) is the required tool to connect to Stallo. An *SSH* client provides secure encrypted communications between two hosts over an insecure network.
If you already have *ssh* installed on your UNIX-like system, have a user account and password on a Notur system, login may be as easy as typing
::
ssh <machine name> (for instance: ssh stallo.uit.no)
into a terminal window.
If your user name on the machine differs from your user name on the local machine, use the -l option to specify the user name on the machine to which you connect. For example:
::
ssh <machine name> -l [username]
And if you need X-forwarding (for instance, if you like to run Emacs in it's own window) you must log in like this:
::
ssh -X -Y <machine name>
No matter how you login, you will need to confirm that the connection shall be trusted. The SHA256 key fingerprint of ``stallo.uit.no`` is:
::
SHA256:YJpwZ91X5FNXTc/5SE1j9UR1UAAI4FFWVwNSoWoq6Hc
So you should get precisely this message the first time you login via ssh:
::
The authenticity of host 'stallo.uit.no (129.242.2.68)' can't be established.
RSA key fingerprint is SHA256:YJpwZ91X5FNXTc/5SE1j9UR1UAAI4FFWVwNSoWoq6Hc.
Are you sure you want to continue connecting (yes/no)?
If you see this message with precisely this code, you can continue by typing ``yes`` and pressing *Enter*. If you connect to Stallo for the first time and ssh does *not* show you this key, please contact support@metacenter.no immediately.
Log in with an ssh key
----------------------
To avoid entering your password every time you login and to increase security, you can log in with an ssh keypair. This keypair consists of a private key that you have to store on your computer and a public key that you can store on Stallo. On Linux or OSX simply type:
::
ssh-keygen
and follow the instructions on the screen. Please use a good passphrase. You will have to enter this passphrase the first time you log in after a reboot of the computer, but not anymore afterwards. To copy the public key to Stallo, type:
::
ssh-copy-id <username>@stallo.uit.no
To learn more about ssh keys, have a look at `this <https://wiki.archlinux.org/index.php/SSH_keys>`_ page.
On Windows, you can use PuTTYgen that comes with PuTTY. More information on `ssh.com <https://www.ssh.com/ssh/putty/windows/puttygen>`_.
SSH clients for Windows and Mac
-------------------------------
At the `OpenSSH page <https://www.openssh.com>`_ you will find several *SSH* alternatives for both Windows and Mac.
Please note that Mac OS X comes with its own implementation of *OpenSSH*, so you don't need to install any third-party software to take advantage of the extra security *SSH* offers. Just open a terminal window and jump in.
Learning more about SSH
-----------------------
To learn more about using SSH, please also consult the `OpenSSH page <https://www.openssh.com>`_ page and take a look at the manual page on your system (*man ssh*).
Obtain a new password
=====================
When you have been granted an account on stallo.uit.no, your username and password is sent to you separately.
The username by email and the password by SMS. The password you receive by SMS is temporally and only valid for 7 days.
The passwd command does not seem to work. My password is reset back to the old one after a while. Why is this happening?
------------------------------------------------------------------------------------------------------------------------
The Stallo system is using a centralised database for user management.
This will override the password changes done locally on Stallo.
The password can be changed `here <https://www.metacenter.no/user/password/>`_, log in using your
username on stallo and the NOTUR domain.
Logging on the compute nodes
============================
Information on how to log in on a compute node.
Some times you may want to log on a compute node (for instance to check
out output files on the local work area on the node), and this is also
done by using SSH. From stallo.uit.no you can log in to
compute-x-y the following way:
::
ssh -Y compute-x-y (for instance: ssh compute-5-8)
or short
::
ssh -Y cx-y (for instance: ssh c5-8)
If you don't need display forwarding you can omit the "-Y" option
above.
If you for instance want to run a program interactively on a compute
node, and with display forwarding to your desktop you should in stead do
something like this:
#. first login to Stallo with display forwarding,
#. then you should reserve a node through the
queuing system
Below is an example on how you can do this:
::
1) Log in on Stallo with display forwarding.
$ ssh -Y stallo.uit.no
2) Reserve and log in on a compute node with display forwarding.
(start an interactive job.)
$ srun -N 1 -t 1:0:0 --pty bash -i
3) Open a new terminal window, type squeue -j <jobid> (it shows you which node(s) was allocated
to that specific job). Then ssh -Y <nodename> to that node and start your preferred gui.
Graphical logon to Stallo
=========================
If you want to run applications with graphical user interfaces we recommend you to use the
`remote desktop service <http://stallo-gui.uit.no/vnc/>`_
on Stallo.
If you have a new account and you have never logged in on Stallo before, first log in with a classical ssh connection (see above). Afterwards you can use the graphical logon. Otherwise it can happen that your /home will not be created and you will get an error message of the type: "Could not chdir to home directory /home/your_user_name: No such file or directory"
**Important:**
If you are connecting from outside the networks of UNINETT and partners you need to log into
stallo-gui.uit.no with ssh to verify that you have a valid username and password on the Stallo system.
After a successful login the service will automatically allow you to connect from the ip-address
your client currently has. The connection will be open for at least 5 minutes after you log in.
There is no need to keep the ssh-connection open after you have connected to the remote desktop,
in fact you will be automatically logged off after 5 seconds.
*************************************
How to get a local account on Stallo
*************************************
To get a local account on Stallo, you need to provide us with:
--------------------------------------------------------------
* Your full name, date of birth, and nationality.
* Your position (master student, PhD, PostDoc, staff member, visitor/guest).
* Your mobile phone number. This is necessary for recovery of passwords.
* Your institutional mail address (i.e. your work email at the research institution to which you belong)
* The name and address of the instruction you belong to; also including name of the center, institute etc.
* A preferred username. Note: If you already have a Notur user account name, please enter that one. A username is defined as a sequence of two to thirty lowercase alphanumeric characters where the first letter may only be a lowercase letter.
* Necessary additional group and account memberships.
* Optional: If you know in advance, please let us know: how many CPU hours you expect to use, how much long-term storage space (GB) you will need, and what software you will use. Partial answers are also welcome. The more we know about the needs of our users, the better services we can provide and the better we can plan for the future.
**If you are a staff member and need to get a local project,** we need information about the project:
* Name of the project
* Brief description of the project
* Field of science for the project
* Name of additonal members of the project
**If you are a student, PhD or post-doc,** you need to also provide us with the name of your advisor and name of the project you are to be a member of.
Compile this list in a proper manner and send to support@metacenter.no with a comment that the application is for a local account on Stallo.
---
sort: 7
---
# Code Development
{% include list.liquid all=true %}
# Compilers
The default development environment on Star is provided by Intel
Cluster Studio XE. In general users are advised to use the Intel
compilers and MKL performance libraries, since they usually give the
best performance.
## Fortran compilers
The recommended Fortran compiler is the Intel Fortran Compiler: ifort.
The gfortran compiler is also installed, but we do not recommend it for
general usage.
## Usage of the Intel ifort compiler
For plain Fortran codes (all Fortran standards) the general form for
usage of the Intel ifort compiler is as follows:
$ ifort [options] file1 [file2 ...]
where options represents zero or more compiler options, and fileN is a
Fortran source (.f .for .ftn .f90 .fpp .F .FOR .F90 .i .i90), assembly
(.s .S), object (.o), static library (.a), or an other linkable file.
Commonly used options may be placed in the ifort.cfg file.
The form above also applies for Fortran codes parallelized with OpenMP
([www.openmp.org](http://www.openmp.org/),
[Wikipedia](https://en.wikipedia.org/wiki/OpenMP)); you only have to
select the necessary compiler options for OpenMP.
For Fortran codes parallelized with MPI the general form is quite
similar:
$ mpif90 [options] file1 [file2 ...]
The wrapper mpif90 is using the Intel ifort compiler and invokes all the
necessary MPI machinery automatically for you. Therefore, everything
else is the same for compiling MPI codes as for compiling plain Fortran
codes.
## C and C++ compilers
The recommended C and C++ compilers are the Intel Compilers; icc (C) and
icpc (C++). The gcc and g++ compilers are also installed, but we do not
recommend them for general usage due to performance issues.
## Usage of the Intel C/C++ compilers
For plain C/C++ codes the general form for usage of the Intel icc/icpc
compilers are as follows:
$ icc [options] file1 [file2 ...] # for C
$ icpc [options] file1 [file2 ...] # for C++
where options represents zero or more compiler options, fileN is a C/C++
source (.C .c .cc .cpp .cxx .c++ .i .ii), assembly (.s .S), object (.o),
static library (.a), or other linkable file. Commonly used options may
be placed in the icc.cfg file (C) or the icpc.cfg (C++).
The form above also applies for C/C++ codes parallelized with OpenMP
([www.openmp.org](http://www.openmp.org/),
[Wikipedia](https://en.wikipedia.org/wiki/OpenMP)); you only have to
select the necessary compiler options for OpenMP.
For C/C++ codes parallelized with MPI the general form is quite similar:
$ mpicc [options] file1 [file2 ...] # for C when using OpenMPI
$ mpiCC [options] file1 [file2 ...] # For C++ when using OpenMPI
Both mpicc and mpiCC are using the Intel compilers, they are just
wrappers that invoke all the necessary MPI machinery automatically for
you. Therefore, everything else is the same for compiling MPI codes as
for compiling plain C/C++ codes.
# Debugging
## Debugging with TotalView
TotalView is a graphical, source-level, multiprocess debugger. It is the
primary debugger on Star and has excellent capabilities for debugging
parallel programs. When using this debugger you need to turn on
X-forwarding, which is done when you login via ssh. This is done by
adding the -Y on newer ssh version, and -X on older:
$ ssh -Y username@star.uit.no
The program you want to debug has to be compiled with the debug option.
This is the "-g" option, on Intel and most compilers. The executable
from this compilation will in the following examples be called
"filename".
First, load the totalview module to get the correct environment
variables set:
$ module load TotalView
To start debugging run:
$ totalview MyProg
Which will start a graphical user interface.
Once inside the debugger, if you cannot see any source code, and keep
the source files in a separate directory, add the search path to this
directory via the main menu item File-\>Search path.
Source lines where it is possible to insert a breakpoint are marked with
a box in the left column. Click on a box to toggle a breakpoint.
Double clicking a function/subroutine name in a source file should open
the source file. You can go back to the previous view by clicking on the
left arrow on the top of the window.
The button "Go" runs the program from the beginning until the first
breakpoint. "Next" and "Step" takes you one line / statement forward.
"Out" will continue until the end of the current subroutine/function.
"Run to" will continue until the next breakpoint.
The value of variables can be inspected by right clicking on the name,
then choose "add to expression list". The variable will now be shown in
a pop up window. Scalar variables will be shown with their value, arrays
with their dimensions and type. To see all values in the array, right
click on the variable in the pop up window and choose "dive". You can
now scroll through the list of values. Another useful option is to
visualize the array: after choosing "dive", open the menu item
"Tools-\>Visualize" of the pop up window. If you did this with a 2D
array, use middle button and drag mouse to rotate the surface that
popped up, shift+middle button to pan, Ctrl+middle button to zoom
in/out.
Running totalview on an interactive node:
$ mkdir -p /global/work/$USER/test_dir
$ cp $HOME/test_dir/a.out /global/work/$USER/test_dir
$ cd /global/work/$USER/test_dir
$ module load TotalView
$ totalview a.out
Replace \[#procs\] with the core-count for the job.
A window with name "New Program" should pop up. Under "Program" write
the name of the executable. Under "Parallel" choose "Open MPI" and
"Tasks" is the number of cores you are using (\[#procs\]).
You can also start Totalview with:
$ mpirun -tv a.out
The users guide and the quick start quide for Totalview can be found on
the [RogueWave documentation
page](https://support.roguewave.com/documentation/tvdocs/en/current).
## Debugging with Valgrind
- A memory error detector
- A thread error detector
- A MPI error detector
- A cache and branch-prediction profiler
- A call-graph generating cache profiler
- A heap profiler
- Very easy to use
Valgrind is the best thing for developers since the invention of
pre-sliced bread!
Valgrind works by emulating one or more CPUs. Hence it can intercept and
inspect your unmodified, running program in ways which would not be
otherwise possible. It can for example check that all variables have
actually been assigned before use, that all memory references are within
their allowed space, even for static arrays and arrays on the stack.
What makes Valgrind so extremely powerful is that it will tell exactly
where in the program a problem, error or violation occurred. It will
also give you information on how many allocates/deallocates the program
performed, and whether there is any unreleased memory or memory leaks at
program exit. In fact, it goes even further and will tell you on what
line the unreleased/leaking memory was allocated. The cache profiler
will give you information about cache misses and where they occur.
The biggest downside to Valgrind is that it will make your program run
much slower. How much slower depends on what kind of, and how much,
information you request. Typically the program will run 10-100 times
slower under Valgrind.
Simply start Valgrind with the path to the binary program to be tested:
$ module load Valgrind
$ valgrind /path/to/prog
This runs Valgrind with the default "tool" which is called memcheck,
which checks memory consistency. When run without any extra flags,
Valgrind will produce a balanced, not overly detailed and informative
output. If you need a more detailed (but slower) report, run Valgrind
with:
$ valgrind --leak-check=full --track-origins=yes --show-reachable=yes /path/to/prog
Of course, if you want to get all possible information about where in
the program something was inconsistent you must compile the program with
debugging flags switched on.
If you have a multi-threaded program (e.g. OpenMP, pthreads), and you
are unsure if there might be possible deadlocks or data races lurking in
your program, the Valgrind thread checker is your best friend. The
thread checking tool is called helgrind:
$ export OMP_NUM_THREADS=2
$ valgrind --tool=helgrind /path/to/prog
For more information on using Valgrind please refer to the man pages and
the Valgrind manual which can be found on the Valgrind website:
<http://www.valgrind.org>
# Environment modules
The user's environment, and which programs are available for immediate
use, is controlled by the `module` command. Many development libraries
are dependant on a particular compiler versions, and at times a specific
MPI library. When loading and/or unloading a compiler, `module`
automatically unloads incompatible modules and, if possible, reloads
compatible versions.
Currently, not all libraries in all combinations of all compilers and
MPI implementations are supported. By using the default compiler and MPI
library, these problems can be avoided. In the future, we aim to
automate the process so that all possible (valid) permutations are
allowed.
Read the `module_scheme` section for an introduction on how to use
modules.
#!/usr/bin/env bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=20
#SBATCH --time=0-00:10:00
module load perf-reports/5.1
# create temporary scratch area for this job on the global file system
SCRATCH_DIRECTORY=/global/work/$USER/$SLURM_JOBID
mkdir -p $SCRATCH_DIRECTORY
# run the performance report
# all you need to do is to launch your "normal" execution
# with "perf-report"
cd $SCRATCH_DIRECTORY
perf-report mpiexec -n 20 $SLURM_SUBMIT_DIR/example.x
# perf-report generates summary files in html and txt format
# we copy result files to submit dir
cp *.html *.txt $SLURM_SUBMIT_DIR
# clean up the scratch directory
cd /tmp
rm -rf $SCRATCH_DIRECTORY
# Profiling and optimization
In general, in order to reach performances close to the theoretical
peak, it is necessary to write your algorithms in a form that allows the
use of scientific library routines, such as BLACS/LAPACK.
## Arm Performance Reports
[Arm Performance
Reports](https://www.arm.com/products/development-tools/hpc-tools/cross-platform/performance-reports)
offers a nice and convenient way to get an overview profile for your run
very quickly. It will introduce a typically negligible runtime overhead
and all you need to do is to load the `perf-reports` module and to
launch your "normal" execution using the `perf-report` launcher.
Here is an example script:
<div class="literalinclude" language="bash" linenos="">
files/perf-reports.sh
</div>
What we do there is to profile an example binary located in
`$SLURM_SUBMIT_DIR/example.x`.
The profiler generates summary files in html and txt format and this is
how an example html summary can look (open it in your browser):
![image](files/perf-reports.jpg)
## Performance tuning by Compiler flags
### Quick and dirty
Use `ifort/icc -O3`. We usually recommend that you use the `ifort/icc`
compilers as they give superior performance on Star. Using `-O3` is a
quick way to get reasonable performance for most applications.
Unfortunately, sometimes the compiler break the code with `-O3` making
it crash or give incorrect results. Try a lower optimization, `-O2` or
`-O1`, if this doesn't help, let us know and we will try to solve this
or report a compiler bug to INTEL. If you need to use `-O2` or `-O1`
instead of `-O3` please remember to add the `-ftz` too, this will flush
small values to zero. Doing this can have a huge impact on the
performance of your application.
### Profile based optimization
The Intel compilers can do something called profile based optimization.
This uses information from the execution of the application to create
more effective code. It is important that you run the application with a
typical input set or else the compiler will tune the application for
another usage profile than you are interested in. With a typical input
set one means for instance a full spatial input set, but using just a
few iterations for the time stepping.
1. Compile with `-prof-gen`.
2. Run the app (might take a long time as optimization is turned off in
this stage).
3. Recompile with `-prof-use`. The simplest case is to
compile/run/recompile in the same catalog or else you need to use
the `-prof-dir` flag, see the manual for details.
## Vtune
Intel Vtune Amplifier is a versatile serial and parallel profiler, with
features such as stack sampling, thread profiling and hardware event
sampling.
## Totalview
[Totalview](https://support.roguewave.com/documentation/tvdocs/en/current)
is a source- and machine-level debugger for multi-process,
multi-threaded programs.
---
sort: 1
sort: 100
---
# Guides
......
---
sort: 2
---
# Getting Help
{% include list.liquid all=true %}
# Contact
If you need help, please file a support request via support@starhpc.hofstra.io,
and our local team of experts will try to assist you as soon as possible.
![rtfm]({{ site.baseurl }}/help/rtfm.png "rtfm")
Credit: https://xkcd.com/293/
This diff is collapsed.
#!/usr/bin/env bash
# Fake some work, $1 is the task number.
# Change this to whatever you want to have done.
# sleep between 0 and 10 secs
let sleeptime=10*$RANDOM/32768
echo "Task $1 is sleeping for $sleeptime seconds"
sleep $sleeptime
echo "Task $1 has slept for $sleeptime seconds"
#!/usr/bin/env bash
# Jobscript example that can run several tasks in parallel.
# All features used here are standard in bash so it should work on
# any sane UNIX/LINUX system.
# Author: roy.dragseth@uit.no
#
# This example will only work within one compute node so let's run
# on one node using all the cpu-cores:
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=20
# We assume we will (in total) be done in 10 minutes:
#SBATCH --time=0-00:10:00
# Let us use all CPUs:
maxpartasks=$SLURM_TASKS_PER_NODE
# Let's assume we have a bunch of tasks we want to perform.
# Each task is done in the form of a shell script with a numerical argument:
# dowork.sh N
# Let's just create some fake arguments with a sequence of numbers
# from 1 to 100, edit this to your liking:
tasks=$(seq 100)
cd $SLURM_SUBMIT_DIR
for t in $tasks; do
# Do the real work, edit this section to your liking.
# remember to background the task or else we will
# run serially
./dowork.sh $t &
# You should leave the rest alone...
# count the number of background tasks we have spawned
# the jobs command print one line per task running so we only need
# to count the number of lines.
activetasks=$(jobs | wc -l)
# if we have filled all the available cpu-cores with work we poll
# every second to wait for tasks to exit.
while [ $activetasks -ge $maxpartasks ]; do
sleep 1
activetasks=$(jobs | wc -l)
done
done
# Ok, all tasks spawned. Now we need to wait for the last ones to
# be finished before we exit.
echo "Waiting for tasks to complete"
wait
echo "done"
# Open Question & Answer Sessions for All Users
## Learn new tricks and ask & discuss questions
Meet the HPC staff, discuss problems and project ideas, give feedback or
suggestions on how to improve services, and get advice for your
projects.
Join us on [Zoom](https://uit.zoom.us/j/65284253551), get yourself a
coffee or tea and have a chat. It doesn’t matter from which institute or
university you are, you are **welcome to join at any time**.
This time we will start with a **short seminar about “Helpful Tools &
Services we offer you might not know about”**. Afterwards we have the
open question and answer session.
We can talk about: - Problems/questions regarding the Star
shutdown and migration - Questions regarding data storage and
management - Help with programming and software management - Help with
project organization and data management - Anything else If you think
you might have a challenging question or topics for us, you can also
send them to us before, so we can come prepared and help you better. If
you have general or specific questions about the event:
<jorn.dietze@uit.no>
## Next event
- **2020-10-13, 13:00 - 15:00**, [online Zoom
meeting](https://uit.zoom.us/j/65284253551)
### Past events
- 2020-10-10, 13:00 - 15:00, online Zoom meeting
- 2020-04-23, 13:00 - 15:00, online Zoom meeting
- 2020-02-19, 10:00 - 12:00, [main kantina](http://bit.ly/36Fhd9y)
- 2019-11-12, 14:00 - 16:00, [main kantina](http://bit.ly/36Fhd9y)
- 2019-09-11, 10:00 - 12:30, MH bygget atrium
### Similar events which serve as inspiration
- <https://scicomp.aalto.fi/news/garage.html>
- <https://openworking.wordpress.com/2019/02/04/coding-problems-just-pop-over/>
# Support staff
## Our vision
Give our users a competitive advantage in the international science race.
### Goal
Contribute to make our scientists the most efficient users of High Performance
Computing (HPC), by creating the best support environment for any HPC user,
and present them to the most efficient solutions to support their highest
demands in HPC.
### Services
The Star HPC support group works within three areas:
* Technical and operational computing resource support.
* Basic to advanced user support for projects and research groups that
utilize HPC or is expecting to do so in the future.
### Regional resource site
Since 2000 Hofstra has been a Resource Site in the National High Performance
Computing Consortium (NOTUR), a joint mission between the four Norwegian
Universities in Bergen, Oslo, Trondheim and Tromsø. Since December 2014
UNINETT Sigma2 manages the national infrastructure and offers High Performance
Computing and Data Storage.
The HPC group at Hofstra also has a role to play as a regional resource site
within HPC, offering HPC services to other institutions in Northern Norway. A
strategic collaboration between Hofstra and the Norwegian Polar Institute within
HPC has been ongoing since 1998.
## Operations Team
* Alexander Rosenberg
* Mani Tofigh
## The Board
* Edward H. Currie
* Daniel P. Miller
* Adam C. Durst
* Jason D. Williams
* Thomas G. Re
* Oren Segal
* John Ortega
# How to write good support requests
Writing good support requests is not only good for the support team, it
is also better for you! Due to us having lots of help requests, the time
to understand each problem is at a premium. The easier it is to
understand yours, the faster we will get to it. Below is a list of good
practices.
## Never send support requests to staff members directly
Always send them to <support@metacenter.no> and staff members will pick
them up there. On <support@metacenter.no> they get tracked and have
higher visibility. Some staff members work on the support line only part
time. Sending the request to <support@metacenter.no> makes sure that
somebody will pick it up. Please note in the request that it is about
Star, as there are more clusters that are handled with this email
address.
## Do not manhandle us as simple "Let me Google that for you" assistants
Seriously: have you searched the internet with the exact error message
and the name of your application...? Other scientists may have had the
very same problem and might have been successful in solving it. By the
way: that's almost always how we start to research, too...
## Give descriptive subject
Your subject line should be descriptive. "Problem on Star" is not a
good subject line since it could be valid for basically every support
email we get. The support staff is a team. The subjects are the first
thing that we see. We would like to be able to classify emails according
to subjects before even opening the email.
## Include actual commands and error messages
We cover this below, but it's so important it needs to be mentioned at
the top, too: include the actual command you run and actual error
messages. Copy and paste. If you don't include this, we will be slightly
annoyed and immediately ask this.
Please, do not screen shoot your ssh terminal and send us pictures (jpg,
png, tiff...) of what you saw on your monitor! From these, we would be
unable to cut & paste commands or error messages, unnecessarily slowing
down our research on your problem. Your sample output does not need at
all to "look good", and we don't need to know what fancy ssh- or
terminal software you have: a simple text-based cut & paste directly
into the mail or ticket is the best we can work on with.
## New problem--new email
Do not send support requests by replying to unrelated issues. Every
issue gets a number and this is the number that you see in the subject
line. Replying to unrelated issues means that your email gets filed
under the wrong thread and risks being overlooked.
## The XY problem
This is a classic problem. Please read <http://xyproblem.info>. Often we
know the solution but sometimes we don't know the problem.
In short (quoting from <http://xyproblem.info>):
- User wants to do X.
- User doesn't know how to do X, but thinks they can fumble their way
to a solution if they can just manage to do Y.
- User doesn't know how to do Y either.
- User asks for help with Y.
- Others try to help user with Y, but are confused because Y seems
like a strange problem to want to solve.
- After much interaction and wasted time, it finally becomes clear
that the user really wants help with X, and that Y wasn't even a
suitable solution for X.
To avoid the XY problem, if you struggle with Y but really what you are
after is X, please also tell us about X. Tell us what you really want to
achieve. Solving Y can take a long time. We have had cases where after
enormous effort on Y we realized that the user wanted X and that Y was
not the best way to achieve X.
## Tell us also what worked
Rather often we get requests of the type "I cannot get X to run on two
nodes". The request then does not mention whether it worked on one node
or on one core or whether it never worked and that this was the first
attempt. Perhaps the problem has even nothing to do with one or two
nodes. In order to better isolate the problem and avoid wasting time
with many back and forth emails, please tell us what actually worked so
far. Tell us what you have tried to isolate the problem. This requires
some effort from you but this is what we expect from you.
## Specify your environment
Have you or your colleague compiled the code? Which modules were loaded?
If you use non-default modules and you do not tell us about it, we will
waste time when debugging with in a different environment.
## Simple cases: Be specific, include commands and errors
Whatever you do, don't say that "X didn't work". Exactly give the
commands you ran, environment (see above), and output error messages.
The actual error messages mean a lot - include all the output, it is
easy to copy and paste it.
The better you describe the problem the less we have to guess and ask.
Sometimes, just seeing the actual error message is enough to give an
useful answer. For all but the simplest cases, you will need to make the
problem reproducible, which you should *always* try anyway. See the
following points.
## Complex cases: Create an example which reproduces the problem
Create an example that we can ideally just copy paste and run and which
demonstrates the problem. It is otherwise very time consuming if the
support team needs to write input files and run scripts based on your
possibly incomplete description. See also next point. Make this example
available to us. We do not search and read read-protected files without
your permission.
## Make the example as small and fast as possible
You run a calculation which crashes after running for one week. You are
tempted to write to support right away with this example but this is not
a good idea. Before you send a support request with this example, first
try to reduce it. Possibly and probably the crash can be reproduced with
a much smaller example (smaller system size or grid or basis set). It is
so much easier to schedule and debug a problem which crashes after few
seconds compared to a run which crashes after hours. Of course this
requires some effort from you but this is what we expect from you in
order to create a useful support request. Often when isolating the
problem, the problem and solution crystallize before even writing the
support request.
# Star HPC Documentation
This area provides guides and resources for using the Star cluster
## New users
Please see the getting started guide for information on accssing and submitting jobs to the Star cluster.
## Getting help
Contact...
---
sort: 4
---
# Jobs
source: `{{ page.path }}`
# Batch system
The Star system is a resource that is shared between many of users and
to ensure fair use everyone must do their computations by submitting
jobs through a batch system that will execute the applications on the
available resources.
The batch system on Star is [SLURM](https://slurm.schedmd.com/)
(Simple Linux Utility for Resource Management.)
## Creating a job script
To run a job on the system you need to create a job script. A job script
is a regular shell script (bash) with some directives specifying the
number of CPUs, memory, etc., that will be interpreted by the batch
system upon submission.
You can find job script examples in `job_script_examples`.
After you wrote your job script as shown in the examples, you can start
it with:
sbatch jobscript.sh
## How to pass command-line parameters to the job script
It is sometimes convenient if you do not have to edit the job script
every time you want to change the input file. Or perhaps you want to
submit hundreds of jobs and loop over a range of input files. For this
it is handy to pass command-line parameters to the job script. For an
overview of the different possible parameters, see `slurm_parameter`.
In SLURM you can do this:
$ sbatch myscript.sh myinput myoutput
And then you can pick the parameters up inside the job script:
#!/bin/bash
#SBATCH ...
#SBATCH ...
...
# argument 1 is myinput
# argument 2 is myoutput
mybinary.x < ${1} > ${2}
For recommended sets of parameters see also `slurm_recommendations`.
## Walltime
We recommend you to be as precise as you can when specifying the
parameters as they will inflict on how fast your jobs will start to run.
We generally have these rules for prioritizing jobs:
1. Large jobs, that is jobs with high CPUcounts, are prioritized.
2. Short jobs take precedence over long jobs.
3. Use fairshare. This means that users with many jobs running will get
a decreased priority compared to other users.
To find out whether all users within one project share the same
priority, run:
$ sshare -a -A nnNNNNk
For a given account (project) consider the column "RawShares". If the
RawShares for the users is "parent", they all share the same fairshare
priority. If it is a number, they have individual priorities.
# Dos and don'ts
- Never run calculations on the home disk
- Always use the SLURM queueing system
- The login nodes are only for editing files and submitting jobs
- Do not run calculations interactively on the login nodes
# Job script examples
## Basic examples
### General blueprint for a jobscript
You can save the following example to a file (e.g. run.sh) on Star.
Comment the two `cp` commands that are just for illustratory purpose
(lines 46 and 55) and change the SBATCH directives where applicable. You
can then run the script by typing:
$ sbatch run.sh
Please note that all values that you define with SBATCH directives are
hard values. When you, for example, ask for 6000 MB of memory
(`--mem=6000MB`) and your job uses more than that, the job will be
automatically killed by the manager.
<div class="literalinclude" language="bash">
files/slurm-blueprint.sh
</div>
### Running many sequential jobs in parallel using job arrays
In this example we wish to run many similar sequential jobs in parallel
using job arrays. We take Python as an example but this does not matter
for the job arrays:
<div class="literalinclude" language="python">
files/test.py
</div>
Save this to a file called "test.py" and try it out:
$ python test.py
start at 15:23:48
sleep for 10 seconds ...
stop at 15:23:58
Good. Now we would like to run this script 16 times at the same time.
For this we use the following script:
<div class="literalinclude" language="bash">
files/slurm-job-array.sh
</div>
Submit the script and after a short while you should see 16 output files
in your submit directory:
$ ls -l output*.txt
-rw------- 1 user user 60 Oct 14 14:44 output_1.txt
-rw------- 1 user user 60 Oct 14 14:44 output_10.txt
-rw------- 1 user user 60 Oct 14 14:44 output_11.txt
-rw------- 1 user user 60 Oct 14 14:44 output_12.txt
-rw------- 1 user user 60 Oct 14 14:44 output_13.txt
-rw------- 1 user user 60 Oct 14 14:44 output_14.txt
-rw------- 1 user user 60 Oct 14 14:44 output_15.txt
-rw------- 1 user user 60 Oct 14 14:44 output_16.txt
-rw------- 1 user user 60 Oct 14 14:44 output_2.txt
-rw------- 1 user user 60 Oct 14 14:44 output_3.txt
-rw------- 1 user user 60 Oct 14 14:44 output_4.txt
-rw------- 1 user user 60 Oct 14 14:44 output_5.txt
-rw------- 1 user user 60 Oct 14 14:44 output_6.txt
-rw------- 1 user user 60 Oct 14 14:44 output_7.txt
-rw------- 1 user user 60 Oct 14 14:44 output_8.txt
-rw------- 1 user user 60 Oct 14 14:44 output_9.txt
### Packaging smaller parallel jobs into one large parallel job
There are several ways to package smaller parallel jobs into one large
parallel job. The preferred way is to use Job Arrays. Browse the web for
many examples on how to do it. Here we want to present a more pedestrian
alternative which can give a lot of flexibility.
In this example we imagine that we wish to run 5 MPI jobs at the same
time, each using 4 tasks, thus totalling to 20 tasks. Once they finish,
we wish to do a post-processing step and then resubmit another set of 5
jobs with 4 tasks each:
<div class="literalinclude" language="bash">
files/slurm-smaller-jobs.sh
</div>
The `wait` commands are important here - the run script will only
continue once all commands started with `&` have completed.
### Example on how to allocate entire memory on one node
<div class="literalinclude" language="bash">
files/slurm-big-memory.sh
</div>
### How to recover files before a job times out
Possibly you would like to clean up the work directory or recover files
for restart in case a job times out. In this example we ask Slurm to
send a signal to our script 120 seconds before it times out to give us a
chance to perform clean-up actions.
<div class="literalinclude" language="bash">
files/slurm-timeout-cleanup.sh
</div>
## OpenMP and MPI
You can download the examples given here to a file (e.g. run.sh) and
start it with:
``` bash
$ sbatch run.sh
```
### Example for an OpenMP job
<div class="literalinclude" language="bash">
files/slurm-OMP.sh
</div>
### Example for a MPI job
<div class="literalinclude" language="bash">
files/slurm-MPI.sh
</div>
### Example for a hybrid MPI/OpenMP job
<div class="literalinclude" language="bash">
files/slurm-MPI-OMP.sh
</div>
If you want to start more than one MPI rank per node you can use
`--ntasks-per-node` in combination with `--nodes`:
``` bash
#SBATCH --nodes=4 --ntasks-per-node=2 --cpus-per-task=8
```
This will start 2 MPI tasks each on 4 nodes, where each task can use up
to 8 threads.
#!/bin/bash -l
#######################################
# example for a hybrid MPI OpenMP job #
#######################################
#SBATCH --job-name=example
# we ask for 4 MPI tasks with 10 cores each
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=10
# run for five minutes
# d-hh:mm:ss
#SBATCH --time=0-00:05:00
# 500MB memory per core
# this is a hard limit
#SBATCH --mem-per-cpu=500MB
# turn on all mail notification
#SBATCH --mail-type=ALL
# you may not place bash commands before the last SBATCH directive
# define and create a unique scratch directory
SCRATCH_DIRECTORY=/global/work/${USER}/example/${SLURM_JOBID}
mkdir -p ${SCRATCH_DIRECTORY}
cd ${SCRATCH_DIRECTORY}
# we copy everything we need to the scratch directory
# ${SLURM_SUBMIT_DIR} points to the path where this script was submitted from
cp ${SLURM_SUBMIT_DIR}/my_binary.x ${SCRATCH_DIRECTORY}
# we set OMP_NUM_THREADS to the number cpu cores per MPI task
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}
# we execute the job and time it
time mpirun -np $SLURM_NTASKS ./my_binary.x > my_output
# after the job is done we copy our output back to $SLURM_SUBMIT_DIR
cp ${SCRATCH_DIRECTORY}/my_output ${SLURM_SUBMIT_DIR}
# we step out of the scratch directory and remove it
cd ${SLURM_SUBMIT_DIR}
rm -rf ${SCRATCH_DIRECTORY}
# happy end
exit 0
#!/bin/bash -l
##########################
# example for an MPI job #
##########################
#SBATCH --job-name=example
# 80 MPI tasks in total
# Stallo has 16 or 20 cores/node and therefore we take
# a number that is divisible by both
#SBATCH --ntasks=80
# run for five minutes
# d-hh:mm:ss
#SBATCH --time=0-00:05:00
# 500MB memory per core
# this is a hard limit
#SBATCH --mem-per-cpu=500MB
# turn on all mail notification
#SBATCH --mail-type=ALL
# you may not place bash commands before the last SBATCH directive
# define and create a unique scratch directory
SCRATCH_DIRECTORY=/global/work/${USER}/example/${SLURM_JOBID}
mkdir -p ${SCRATCH_DIRECTORY}
cd ${SCRATCH_DIRECTORY}
# we copy everything we need to the scratch directory
# ${SLURM_SUBMIT_DIR} points to the path where this script was submitted from
cp ${SLURM_SUBMIT_DIR}/my_binary.x ${SCRATCH_DIRECTORY}
# we execute the job and time it
time mpirun -np $SLURM_NTASKS ./my_binary.x > my_output
# after the job is done we copy our output back to $SLURM_SUBMIT_DIR
cp ${SCRATCH_DIRECTORY}/my_output ${SLURM_SUBMIT_DIR}
# we step out of the scratch directory and remove it
cd ${SLURM_SUBMIT_DIR}
rm -rf ${SCRATCH_DIRECTORY}
# happy end
exit 0
#!/bin/bash -l
#############################
# example for an OpenMP job #
#############################
#SBATCH --job-name=example
# we ask for 1 task with 20 cores
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=20
# exclusive makes all memory available
#SBATCH --exclusive
# run for five minutes
# d-hh:mm:ss
#SBATCH --time=0-00:05:00
# turn on all mail notification
#SBATCH --mail-type=ALL
# you may not place bash commands before the last SBATCH directive
# define and create a unique scratch directory
SCRATCH_DIRECTORY=/global/work/${USER}/example/${SLURM_JOBID}
mkdir -p ${SCRATCH_DIRECTORY}
cd ${SCRATCH_DIRECTORY}
# we copy everything we need to the scratch directory
# ${SLURM_SUBMIT_DIR} points to the path where this script was submitted from
cp ${SLURM_SUBMIT_DIR}/my_binary.x ${SCRATCH_DIRECTORY}
# we set OMP_NUM_THREADS to the number of available cores
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}
# we execute the job and time it
time ./my_binary.x > my_output
# after the job is done we copy our output back to $SLURM_SUBMIT_DIR
cp ${SCRATCH_DIRECTORY}/my_output ${SLURM_SUBMIT_DIR}
# we step out of the scratch directory and remove it
cd ${SLURM_SUBMIT_DIR}
rm -rf ${SCRATCH_DIRECTORY}
# happy end
exit 0
#!/bin/bash -l
###################################################
# Example for a job that consumes a lot of memory #
###################################################
#SBATCH --job-name=example
# we ask for 1 node
#SBATCH --nodes=1
# run for five minutes
# d-hh:mm:ss
#SBATCH --time=0-00:05:00
# total memory for this job
# this is a hard limit
# note that if you ask for more than one CPU has, your account gets
# charged for the other (idle) CPUs as well
#SBATCH --mem=31000MB
# turn on all mail notification
#SBATCH --mail-type=ALL
# you may not place bash commands before the last SBATCH directive
# define and create a unique scratch directory
SCRATCH_DIRECTORY=/global/work/${USER}/example/${SLURM_JOBID}
mkdir -p ${SCRATCH_DIRECTORY}
cd ${SCRATCH_DIRECTORY}
# we copy everything we need to the scratch directory
# ${SLURM_SUBMIT_DIR} points to the path where this script was submitted from
cp ${SLURM_SUBMIT_DIR}/my_binary.x ${SCRATCH_DIRECTORY}
# we execute the job and time it
time ./my_binary.x > my_output
# after the job is done we copy our output back to $SLURM_SUBMIT_DIR
cp ${SCRATCH_DIRECTORY}/my_output ${SLURM_SUBMIT_DIR}
# we step out of the scratch directory and remove it
cd ${SLURM_SUBMIT_DIR}
rm -rf ${SCRATCH_DIRECTORY}
# happy end
exit 0
#!/bin/bash -l
##############################
# Job blueprint #
##############################
# Give your job a name, so you can recognize it in the queue overview
#SBATCH --job-name=example
# Define, how many nodes you need. Here, we ask for 1 node.
# Each node has 16 or 20 CPU cores.
#SBATCH --nodes=1
# You can further define the number of tasks with --ntasks-per-*
# See "man sbatch" for details. e.g. --ntasks=4 will ask for 4 cpus.
# Define, how long the job will run in real time. This is a hard cap meaning
# that if the job runs longer than what is written here, it will be
# force-stopped by the server. If you make the expected time too long, it will
# take longer for the job to start. Here, we say the job will take 5 minutes.
# d-hh:mm:ss
#SBATCH --time=0-00:05:00
# Define the partition on which the job shall run. May be omitted.
#SBATCH --partition normal
# How much memory you need.
# --mem will define memory per node and
# --mem-per-cpu will define memory per CPU/core. Choose one of those.
#SBATCH --mem-per-cpu=1500MB
##SBATCH --mem=5GB # this one is not in effect, due to the double hash
# Turn on mail notification. There are many possible self-explaining values:
# NONE, BEGIN, END, FAIL, ALL (including all aforementioned)
# For more values, check "man sbatch"
#SBATCH --mail-type=END,FAIL
# You may not place any commands before the last SBATCH directive
# Define and create a unique scratch directory for this job
SCRATCH_DIRECTORY=/global/work/${USER}/${SLURM_JOBID}.stallo-adm.uit.no
mkdir -p ${SCRATCH_DIRECTORY}
cd ${SCRATCH_DIRECTORY}
# You can copy everything you need to the scratch directory
# ${SLURM_SUBMIT_DIR} points to the path where this script was submitted from
cp ${SLURM_SUBMIT_DIR}/myfiles*.txt ${SCRATCH_DIRECTORY}
# This is where the actual work is done. In this case, the script only waits.
# The time command is optional, but it may give you a hint on how long the
# command worked
time sleep 10
#sleep 10
# After the job is done we copy our output back to $SLURM_SUBMIT_DIR
cp ${SCRATCH_DIRECTORY}/my_output ${SLURM_SUBMIT_DIR}
# In addition to the copied files, you will also find a file called
# slurm-1234.out in the submit directory. This file will contain all output that
# was produced during runtime, i.e. stdout and stderr.
# After everything is saved to the home directory, delete the work directory to
# save space on /global/work
cd ${SLURM_SUBMIT_DIR}
rm -rf ${SCRATCH_DIRECTORY}
# Finish the script
exit 0
#!/bin/bash -l
#####################
# job-array example #
#####################
#SBATCH --job-name=example
# 16 jobs will run in this array at the same time
#SBATCH --array=1-16
# run for five minutes
# d-hh:mm:ss
#SBATCH --time=0-00:05:00
# 500MB memory per core
# this is a hard limit
#SBATCH --mem-per-cpu=500MB
# you may not place bash commands before the last SBATCH directive
# define and create a unique scratch directory
SCRATCH_DIRECTORY=/global/work/${USER}/job-array-example/${SLURM_JOBID}
mkdir -p ${SCRATCH_DIRECTORY}
cd ${SCRATCH_DIRECTORY}
cp ${SLURM_SUBMIT_DIR}/test.py ${SCRATCH_DIRECTORY}
# each job will see a different ${SLURM_ARRAY_TASK_ID}
echo "now processing task id:: " ${SLURM_ARRAY_TASK_ID}
python test.py > output_${SLURM_ARRAY_TASK_ID}.txt
# after the job is done we copy our output back to $SLURM_SUBMIT_DIR
cp output_${SLURM_ARRAY_TASK_ID}.txt ${SLURM_SUBMIT_DIR}
# we step out of the scratch directory and remove it
cd ${SLURM_SUBMIT_DIR}
rm -rf ${SCRATCH_DIRECTORY}
# happy end
exit 0
#!/bin/bash
#SBATCH --job-name=example
#SBATCH --ntasks=20
#SBATCH --time=0-00:05:00
#SBATCH --mem-per-cpu=500MB
cd ${SLURM_SUBMIT_DIR}
# first set of parallel runs
mpirun -n 4 ./my-binary &
mpirun -n 4 ./my-binary &
mpirun -n 4 ./my-binary &
mpirun -n 4 ./my-binary &
mpirun -n 4 ./my-binary &
wait
# here a post-processing step
# ...
# another set of parallel runs
mpirun -n 4 ./my-binary &
mpirun -n 4 ./my-binary &
mpirun -n 4 ./my-binary &
mpirun -n 4 ./my-binary &
mpirun -n 4 ./my-binary &
wait
exit 0
#!/bin/bash -l
# job name
#SBATCH --job-name=example
# replace this by your account
#SBATCH --account=...
# one core only
#SBATCH --ntasks=1
# we give this job 4 minutes
#SBATCH --time=0-00:04:00
# asks SLURM to send the USR1 signal 120 seconds before end of the time limit
#SBATCH --signal=B:USR1@120
# define the handler function
# note that this is not executed here, but rather
# when the associated signal is sent
your_cleanup_function()
{
echo "function your_cleanup_function called at $(date)"
# do whatever cleanup you want here
}
# call your_cleanup_function once we receive USR1 signal
trap 'your_cleanup_function' USR1
echo "starting calculation at $(date)"
# the calculation "computes" (in this case sleeps) for 1000 seconds
# but we asked slurm only for 240 seconds so it will not finish
# the "&" after the compute step and "wait" are important
sleep 1000 &
wait
#!/usr/bin/env python
import time
print('start at ' + time.strftime('%H:%M:%S'))
print('sleep for 10 seconds ...')
time.sleep(10)
print('stop at ' + time.strftime('%H:%M:%S'))
# Interactive jobs
## Starting an interactive job
You can run an interactive job like this:
$ srun --nodes=1 --ntasks-per-node=1 --time=01:00:00 --pty bash -i
Here we ask for a single core on one interactive node for one hour with
the default amount of memory. The command prompt will appear as soon as
the job starts.
This is how it looks once the interactive job starts:
srun: job 12345 queued and waiting for resources
srun: job 12345 has been allocated resources
Exit the bash shell to end the job. If you exceed the time or memory
limits the job will also abort.
Interactive jobs have the same policies as normal batch jobs, there are
no extra restrictions. You should be aware that you might be sharing the
node with other users, so play nice.
Some users have experienced problems with the command, then it has
helped to specify the cpu account:
$ srun --account=<NAME_OF_MY_ACCOUNT> --nodes=1 --ntasks-per-node=1 --time=01:00:00 --pty bash -i
## Keeping interactive jobs alive
Interactive jobs die when you disconnect from the login node either by
choice or by internet connection problems. To keep a job alive you can
use a terminal multiplexer like `tmux`.
tmux allows you to run processes as usual in your standard bash shell
You start tmux on the login node before you get a interactive slurm
session with `srun` and then do all the work in it. In case of a
disconnect you simply reconnect to the login node and attach to the tmux
session again by typing:
tmux attach
or in case you have multiple sessions running:
tmux list-session
tmux attach -t SESSION_NUMBER
As long as the tmux session is not closed or terminated (e.g. by a
server restart) your session should continue. One problem with our
systems is that the tmux session is bound to the particular login server
you get connected to. So if you start a tmux session on star-1 and
next time you get randomly connected to star-2 you first have to
connect to star-1 again by:
ssh login-1
To log out a tmux session without closing it you have to press CTRL-B
(that the Ctrl key and simultaneously "b", which is the standard tmux
prefix) and then "d" (without the quotation marks). To close a session
just close the bash session with either CTRL-D or type exit. You can get
a list of all tmux commands by CTRL-B and the ? (question mark). See
also [this
page](https://www.hamvocke.com/blog/a-quick-and-easy-guide-to-tmux/) for
a short tutorial of tmux. Otherwise working inside of a tmux session is
almost the same as a normal bash session.
# Managing jobs
The lifecycle of a job can be managed with as little as three different
commands:
1. Submit the job with `sbatch <script_name>`.
2. Check the job status with `squeue`. (to limit the display to only
your jobs use `squeue -u <user_name>`.)
3. (optional) Delete the job with `scancel <job_id>`.
You can also hold the start of a job:
scontrol hold \<job_id\>
Put a hold on the job. A job on hold will not start or block other jobs
from starting until you release the hold.
scontrol release \<job_id\>
Release the hold on a job.
## Job status descriptions in squeue
When you run `squeue` (probably limiting the output with
`squeue -u <user_name>`), you will get a list of all jobs currently
running or waiting to start. Most of the columns should be
self-explaining, but the *ST* and *NODELIST (REASON)* columns can be
confusing.
*ST* stands for *state*. The most important states are listed below. For
a more comprehensive list, check the [squeue help page section Job State
Codes](https://slurm.schedmd.com/squeue.html#lbAG).
R
The job is running
PD
The job is pending (i.e. waiting to run)
CG
The job is completing, meaning that it will be finished soon
The column *NODELIST (REASON)* will show you a list of computing nodes
the job is running on if the job is actually running. If the job is
pending, the column will give you a reason why it still pending. The
most important reasons are listed below. For a more comprehensive list,
check the [squeue help page section Job Reason
Codes](https://slurm.schedmd.com/squeue.html#lbAF).
Priority
There is another pending job with higher priority
Resources
The job has the highest priority, but is waiting for some running job to
finish.
QOS\*Limit
This should only happen if you run your job with `--qos=devel`. In
developer mode you may only have one single job in the queue.
launch failed requeued held
Job launch failed for some reason. This is normally due to a faulty
node. Please contact us via <support@metacenter.no> stating the problem,
your user name, and the jobid(s).
Dependency
Job cannot start before some other job is finished. This should only
happen if you started the job with `--dependency=...`
DependencyNeverSatisfied
Same as *Dependency*, but that other job failed. You must cancel the job
with `scancel JOBID`.
# Monitoring your jobs
## SLURM commands
To monitor your jobs, you can use of of those commands. For details run
them with the <span class="title-ref">-</span>-help option:
`scontrol show jobid -dd <jobid>` lists detailed information for a job
(useful for troubleshooting).
`sacct -j <jobid> --format=JobID,JobName,MaxRSS,Elapsed` will give you
statistics on completed jobs by jobID. Once your job has completed, you
can get additional information that was not available during the run.
This includes run time, memory used, etc.
From our monitoring tool Ganglia, you can watch live status information
on Star:
- [Load situation](http://star-adm.uit.no/ganglia/)
- [Job
queue](http://star-login2.uit.no/slurmbrowser/html/squeue.html)
## CPU load and memory consumption of your job
Star has only limited resources and usually a high demand. Therefore
using the available resources as efficient as possible is paramount to
have short queueing times and getting most out of your quota.
### Accessing slurmbrowser
In order to find out the CPU load and memory consumption of your running
jobs or jobs which have finished less than 48 hours ago, please use the
[job browser](http://star-login2.uit.no/slurmbrowser/html/squeue.html)
(accessible from within the UNINETT network).
For users from outside of UNINETT, you have to setup a ssh tunnel to
star with:
ssh -L8080:localhost:80 star-login2.uit.no
After that you can access slurmbrowser at
<http://localhost:8080/slurmbrowser/html/squeue.html>
### Detecting inefficient jobs
You can filter for a slurm job ID, account name or user name with the
search bar in the upper left corner.
For single- or multinode jobs the `AvgNodeLoad` is an important
indicator if your jobs runs efficiently, at least with respect to CPU
usage. If you use the whole node, the average node load should be close
to number of CPU cores of that node (so 16 or 20 on Star). In some
cases it is totally acceptable to have a low load if you for instance
need a lot of memory but in general either CPU or memory load should be
high. Otherwise you are wasting your quota and experience probably
longer than necessary queuing times.
If you detect inefficient jobs you either look for ways to improve the
resource usage of your job or ask for less resources in your SLURM
script.
## Understanding your job status
When you look at the job queue through the [job
browser](http://star-login2.uit.no/slurmbrowser/html/squeue.html), or
you use the `squeue` command, you will see that the queue is divided in
3 parts: Active jobs, Idle jobs, and Blocked jobs.
Active jobs are the jobs that are running at the moment. Idle jobs are
next in line to start running, when the needed resources become
available. Each user can by default have only one job in the Idle Jobs
queue.
Blocked jobs are all other jobs. Their state can be *Idle*, *Hold*, or
*Deferred*. *Idle* means that they are waiting to get to the Idle queue.
They will eventually start when the resources become available. The jobs
with the *Hold* state have been put on hold either by the system, or by
the user. F.e. if you have one job in the Idle queue, that is not very
important to you, and it is blocking other, more urgent, jobs from
starting, you might want to put that one job on hold. Jobs on hold will
not start until the hold is released. *Deferred* jobs will not start. In
most cases, the job is deferred because it is asking for a combination
of resources that Star can not provide.
Please contact the support staff, if you don't understand why your job
has a hold or deferred state.
## Summary of used resources
Slurm will append a summary of used resources to the `slurm-xxx.out`
file. The fields are:
- Task and CPU usage stats
- `AllocCPUS`: Number of allocated CPUs
- `NTasks`: Total number of tasks in a job or step.
- `MinCPU`: Minimum CPU time of all tasks in job (system + user).
- `MinCPUTask`: The task ID where the mincpu occurred.
- `AveCPU`: Average CPU time of all tasks in job (system + user)
- `Elapsed`: The jobs elapsed time in format \[DD-\[HH:\]\]MM:SS.
- `ExitCode`: The exit code returned by the job script. Following
the colon is the signal that caused the process to terminate if
it was terminated by a signal.
- Memory usage stats
- `MaxRSS`: Maximum resident set size of all tasks in job.
- `MaxRSSTask`: The task ID where the maxrss occurred.
- `AveRSS`: Average resident set size of all tasks in job.
- `MaxPages`: Maximum number of page faults of all tasks in job.
- `MaxPagesTask`: The task ID where the maxpages occurred.
- `AvePages`: Average number of page faults of all tasks in job.
- Disk usage stats
- `MaxDiskRead`: Maximum number of bytes read by all tasks in job.
- `MaxDiskReadTask`: The task ID where the maxdiskread occurred.
- `AveDiskRead`: Average number of bytes read by all tasks in job.
- `MaxDiskWrite`: Maximum number of bytes written by all tasks in
job.
- `MaxDiskWriteTask`: The task ID where the maxdiskwrite occurred.
- `AveDiskWrite`: Average number of bytes written by all tasks in
job.
# Running MPI jobs
There are two available MPI implementations on Star:
- OpenMPI provided by the `foss` module, e.g. `module load foss/2016b`
- Intel MPI provided by the `intel` module, e.g.
`module load intel/2016b`
There are several ways of launching an MPI application within a SLURM
allocation, e.g. `srun`, `mpirun`, `mpiexec` and `mpiexec.hydra`.
Unfortunately, the best way to launch your program depends on the MPI
implementation (and possibly your application), and choosing the wrong
command can severly affect the efficiency of your parallel run. Our
recommendation is the following:
## Intel MPI
With Intel MPI, we have found that `mpirun` can incorrectly distribute
the MPI ranks among the allocated nodes. We thus recommend using `srun`:
$ srun --mpi=pmi2 ./my_application
## OpenMPI
With OpenMPI, `mpirun` seems to be working correctly. Also, it seems
that `srun` fails to launch your application in parallel, so here we
recommend using `mpirun`:
$ mpirun ./my_application
NOTE: If you're running on the `multinode` partition you automatically
get the `--exclusive` flag, e.i. you get allocated (and charged for)
**full** nodes, even if you explicitly ask for less resources per node.
This is not the case for the `normal` partition.
This diff is collapsed.
.. _new_sw:
New software and module scheme
==============================
**On the 9th-10th of May 2017, the default module setup changed. It is likely that you will have problems loading modules and jobs waiting to start might crash. Jobs already running should not be affected.**
What this means:
We have many new software packages installed, that were not visible by default,
though they were already accessible. On the afore mentioned days we made these packages visible by default. There are several big changes that you will experience, and sadly this might break your job scripts and cause you error messages when trying to load modules. Jobs that are waiting to start might crash at start because they can not load modules, but already running jobs should run without problems. We apologize for these problems and we believe that this way of switching is still a better alternative than shutting Stallo down.
If you have trouble or questions, please, check this manual and contact us.
Here are the main changes you will experience:
Change of software module names
-------------------------------
You will still be able to use the "old" software you used before, but many names of sw packages will change.
The name changes are in using mixed case instead of just small case, f.e. "python" is will become "Python", "openmpi" becomes "OpenMPI", and so on. The version of the modules has also changed. In most modules the version now contains some basic information on what it was compiled with - f.e. intel/2016a or foss/2016a (Free Open Source Software, commonly referred to as GNU) - and in some cases also information on some dependency - f.e. Python/2.7.12. In the new installation, the name "intel" has a somewhat different meaning from the old installations: loading intel/13.0 will give you the intel module, and you will have icc and ifort available to you; loading intel/2016a will load several packages including icc, ifort, imkl and impi. More information on this coming soon.
If you are looking for a package, try::
$ module avail <package_name>
This command is case insensitive, and lists you all modules, which you can load for this package
(both from the new and the old installations). If you get too many results because the name of your package is part of other names (f.e. "R", "intel", or "Python") add a "/" at the end of the name::
$ module avail <package_name>/
Behavior of potentially conflicting modules
-------------------------------------------
Another important change is behavior of potentially conflicting modules. In the "old setup" you are
not able to have two conflicting modules - f.e. openmpi and impi - loaded at the same time.
This is only partially true for the "new scheme" and you will need to pay more attention to this.
With the new packages, it is also more common that when you load one package it will automatically
load several other packages that it needs to function.
Check what you have currently loaded with::
$ module load
or ``ml`` for short, and unload conflicting modules, or do::
$ module purge
to remove all loaded packages and start over.
StdEnv and StdMod
-----------------
By default, you will have StdEnv and StdMod loaded. StdEnv sets up the paths and other variables for you to have a working module environment. It has a "sticky" tag and will not be unloaded with a ``module purge`` or a ``module unload StdEnv``. We strongly recommend keeping it loaded.
StdMod loads the default system modules. You are welcome to unload it, if you wish, and both ``module unload StdMod`` or ``module purge`` will unload it.
Lmod warning "rebuild your saved collection"
--------------------------------------------
Lmod allows a user to save a bundle of modules as a collection using ``module save <collection_name>`` and ``module restore <collection_name>``. This enables you to quickly get the same list of modules loaded if you tend to use the same modules over and over.
With a new module scheme came a different system MODULEPATH. For this reason, if you have some module collections saved, you will experience the following warning: "Lmod Warning: The system MODULEPATH has changed: please rebuild your saved collection."
To solve this you need to remove your old collections and create them again. We apologize for the inconvenience.
If you are already using "mod_setup.sh" or "/home/easybuild/"
-------------------------------------------------------------
If you have already started using the new modules by loading the "notur" module and sourcing "mod_setup.sh", simply remove these two steps and continue working as normal. The same goes for any references to /home/easybuild/. We kept these options working for now to reduce the number of potentially crashing jobs, but we plan to remove them in the near future, so we recommend removing those lines from any new jobs you launch. You do not need the module "varset" any more, StdEnv has taken over its functionality. If the only reason you were using the "notur" module was to access the newer installations, you do not need it either. If you were using the "notur" module to run ADF or other software, then you still need to continue using it.
.. figure:: tag.jpg
:scale: 57 %
Found in Via Notari, Pisa; (c) Roberto Di Remigio.
.. _news:
News and notifications
======================
News about planned/unplanned downtime, changes in hardware, and important
changes in software will be published on the HPC UiT twitter account
`<https://twitter.com/hpc_uit>`_ and on the login screen of stallo.
For more information on the different situations see below.
System status and activity
--------------------------
You can get a quick overview of the system load on Stallo on the
`Sigma2 hardware page <https://www.sigma2.no/hardware/status>`_.
More information on the system load, queued jobs, and node states can
be found on the `jobbrowser page <http://stallo-login2.uit.no/slurmbrowser/html/squeue.html>`_
(only visible from within the UiT network).
Planned/unplanned downtime
--------------------------
In the case of a planned downtime, a reservation will be made in the
queuing system, so new jobs, that would not finish until the downtime,
won't start. If the system goes down, running jobs will be restarted,
if possible. We apologize for any inconveniences this may cause.
# General comments about licenses
This is an attempt to describe the different licenses we have on Star
and for the entire NOTUR system when it comes to chemistry and material
science codes.
Here, I will only address the codes with a certain access limitation.
For free, open source codes without any limitations, I do not really see
the point in commenting. But, the reader should be aware that "free" not
allways means "unrestricted access".
Also, due to a choice of policy, I am only adressing the issues of
academic access. For commercial licenses, the user or user groups needs
to investigate consequences and prices themselves. For any given code,
the license model and related issues is more in depth described as a
part of the standard documentation.
There is in general 6 different categories of access limitations for
software on Star:
1. Licenses paid entirely by the users.
2. Licenses paid partially by the users and partly by national bodies.
3. Licenses paid partly or entirely by institutions.
4. Licenses paid entirely by national bodies.
5. Licenses available only on a machine or a service (machine-license).
6. Free licenses that requires an agreement with the vendor before
using the code.
Category 1 is what is the "bring your own license" group. Either the
vendor does not provide computer centre license, or pricing is somewhat
outside the boundaries of acceptance for NOTUR. VASP is in this
category, where NOTUR does administer a certain level of access based on
feedback from the VASP vendor. Other examples is Petrel, Molpro
(previously).
Category 2 is a joint funding model, where users have to add to the
total pool of finance in order to provide a certain level of license
tokens available. For instance, the Schrodinger small molecule drug
discovery suite license is split into a "national" and a "local" part,
with one common license server and unlimited tokens available for
everyone. NOTUR provides support and manages the license tokens on
behalf of the community, up until now as a part of DS Chem (predecessor
to the Application Managment project). The license is monitored, in
order to ensure a minimum level of fairness. If a user crosses a certain
level of usage without representing one of the groups contributing to
funding - access will be denied until finance formalities has been
settled.
Category 3 is the group of software where your respective host
institution has implication for access or not. Codes like COMSOL,
MATLAB, STATA, ANSYS, STAR-CCM+, IDL, Mathematica are all in this
category.
Category 4 is a lot more complex than you might anticipate. Codes like
ADF and Turbomole are in this category. ADF is currently entirely funded
by Sigma2, but how long this will continue is unknown. For ADF, we have
a national license - meaning that anyone can request a download and
install the code suite as long as they are members on a Norwegian Grant
degreeing institution. Turbomole has currently the form of a national
machine type license, as a part of prevoiously mentioned DS Chem. As
long as on person has the responsibility of installing Turbomole on all
the national academic high performance compute clusters, we only have to
by one common license for all these machines. Currently, the national
body has paid for a two group Crystal license, in order to map interest
and need. We have (or at least had when the license was bought) an
agreement that if the interest came to the level where we needed to
grant access to more people, we could transform the license into a
machine type license by only paying the difference in license fee.
Gaussian has the appearance of a national license, and is paid by
NOTUR - but it really is 4 site licenses making a total of "access
license for those who comes from one of the four original norwegian
academic hpc host sites - NTNU, UiB, UiO, Hofstra". It also means that
others that do hold a valid Gaussian license will be allowed access.
This licensing is limited to main version (G09, G16 etc) and may explain
limitations on the more recent versions. The intel tools and compiler
suites for use on the national academic hpc cluster is also in this
category.
Category 5 is the group of software that is only available on a single
installation/machine. On Star we have NBO6 and the nbo6 plug in for
ADF. On Abel they have an installation of Molpro which is both machine
type and site type licensed.
Category 6 In this group, we have Molcas, ORCA. (and maybe Dalton and
Dirac). Some vendors, even if they offer their software free for
academic usage in general, or for limited areas (Molcas is free for
academics from Nordic institutions), they still want to have a certain
control over the software distribution.
and, of course, category 7 is the entire group of software where this is
not really a problem;-).
You will find more info in the application guides or in the respective
module files. The latter info you get access to by typing:
``` bash
module help <application> # Like for instance ADF
```
---
sort: 5
---
# Software
{% include list.liquid all=true %}
# Which software is installed on Star
To find out what software packages are available, type:
module avail
## Changes in application software
News about planned/unplanned downtime, changes in hardware, and
important changes in software will be published on the HPC Hofstra twitter
account <https://twitter.com/hpc_uit> and on the login screen of star.
For more information on the different situations see <span
class="title-ref">news</span>.
The easiest way to check which software and versions available is to use
the `module` command. List all software available:
module avail
List all version of a specific software:
module avail software-name
# Missing or new software
If there is any software missing on this list that you would like to
have installed on Star, or you need help to install your own software,
please feel free to contact the support personal about this:
<support@metacenter.no>.
If there is software missing on Star that you would like to have,
there are different options to install it: \* Use our installation
framework EasyBuild \* Use a package manager like Conda or especially
for bioinformatic software Bioconda \* Compile it yourself and install
into your home directory \* Ask us to install for you
In this document we will show some ways of installing a new software. If
you run into problems, need help or have question, please feel free to
contact the support personal about this: <support@metacenter.no>.
## Installing new software using EasyBuild
Our building and installation framework,
[EasyBuild](https://easybuild.readthedocs.io/en/latest/Using_the_EasyBuild_command_line.html)
can be used by you to install software into your own home directory.
This is especially interesting for you if you need a software but it is
probably not used by other users of Star, so a systemwide installation
doesn't make much sense.
To see if your software is available via the EasyBuild system have a
look into the [easyconfig github
repository](https://github.com/easybuilders/easybuild-easyconfigs). if
you can find a version of your software there. Alternatively use the
search function of EasyBuild:
eb -S REGEX
Let's assume you want to install the genome aligner Kraken2. We find a
easyconfig file with the name
[Kraken2-2.0.7-beta-foss-2018b-Perl-5.28.0.eb](https://github.com/easybuilders/easybuild-easyconfigs/blob/master/easybuild/easyconfigs/k/Kraken2/Kraken2-2.0.7-beta-foss-2018b-Perl-5.28.0.eb)
in the easyconfig repository.
Before we start the installation process we first have to load the
necessary modules:
ml purge
ml EasyBuild
Before we actually build we can check the building process by adding
<span class="title-ref">--dry-run-short</span> or <span
class="title-ref">-D</span> to the build command:
eb Kraken2-2.0.7-beta-foss-2018b-Perl-5.28.0.eb --dry-run-short
As you probably see, a long list of packages is printed with some
packages being marked as missing. To automatically install all necessary
dependencies we can add <span class="title-ref">--robot</span> and then
run the installation:
eb Kraken2-2.0.7-beta-foss-2018b-Perl-5.28.0.eb --robot
Before we can load your newly installed module, we have to add the path
to the Lmod search path. You might have to do this from time to time, or
even every time you use your software, as your local cache is regularly
updated. :
ml use ~/.local/easybuild/modules/all
ml Kraken2/2.0.7
## Conda/ Bioconda
Many software packages, especially if they are python based, can be
easily installed using the Conda package manager. For many
bioinformatics software, Bioconda has become a nice solution.
A small tutorial can be found in the [Python]({% link software/python_r_perl.md %}#python) section of this
documentation.
# Application guides
For a general explanation on how to make an application available for
use, the module system, and information about changes in application
software see module\_scheme.
.. _ADF:
===============================================
The Amsterdam Density Functional modeling suite
===============================================
This page contains general information about the ADF modeling suite installed on Stallo:
Related information
-------------------
.. toctree:: :maxdepth: 1
:titlesonly:
firsttime_adf
ADFprog
Band
firsttime_band
advanced
General Information
===================
Description
-----------
The DFT code ADF is used in many areas of chemistry and materials science, and consists of the following parts:
* The pure DFT code based on Slater type orbitals; ADF.
* The periodic DFT code BAND shares a lot of functionality with ADF.d for treating large periodic molecular systems.
* DFTB and MOPAC are fast approximate methods to study large molecules and big periodic systems, employing DFT-based and semi-empirical data, respectively.
* Reaction dynamics in large complex systems can be studied with bond order based ReaxFF.
* COSMO-RS uses quantum mechanical data from ADF to predict thermodynamic properties of solutions and mixtures (LogP, VLE, pKa, …)
Online info from vendor
-----------------------
* Homepage: https://www.scm.com
* Documentation: https://www.scm.com/doc
The support people in NOTUR, do not provide trouble shooting guides anymore, due to a national agreement that it is better for the community as a whole to add to the community info/knowledge pool where such is made available. For ADF/BAND we advise to search in general documentation, sending emails to support (either metacenter or scm) or trying the ADF mailing list (see https://www.scm.com/Support for more info).
License and access policy
-------------------------
The license of ADF/Band is commercial.
NOTUR holds a national license of the ADF program system, making usage of ADF/BAND available for all academic scient\
ists in Norway.
We have a national license for the following packages:
- ADF & ADFGUI
- BAND &BANDGUI
- CRS
- DFTB & DFTBGUI
- GUI
- REAXFF & REAXFFGUI
- NBO6 (on Stallo only, but machine license available for all users of Stallo).
`Please note that this is an academic type license; meaning that research institutes not being part of Norwegian Universities must provide their own l\
icense to be able to use and publish results obtained with this code on NOTUR installlations.`
Citation
--------
When publishing results obtained with the referred software referred, please do check the developers web page in order to find the correct citat\
ion(s).
.. _ADFprog:
====================================
Amsterdam Density Functional program
====================================
Information regarding the quantum chemistry code ADF on Stallo
General Information
===================
Description
-----------
According to the vendor, ADF (Amsterdam Density Functional) is a DFT program particularly strong in understanding and predicting structure, reactivity, and spectra of molecules. It is a Fortran program for calculations on atoms and molecules (in gas phase or solution). It can be used for the study of such diverse fields as molecular spectroscopy, organic and inorganic chemistry, crystallography and pharmacochemistry.
The underlying theory is the Kohn-Sham approach to Density-Functional Theory (DFT). The software is a DFT-only first-principles electronic structure calculations program system, and consists of a rich variety of packages.
Online documentation from vendor
--------------------------------
* Documentation: https://www.scm.com/doc/ADF
Usage
=====
The ADF/BAND suite of software is currently installed as precompiled binaries on Stallo. We install the intel-mpi version, since it has proven to collaborate the better with our mpi setup. We generally advise to run ADF on more than one node, unless you do know that your particular problem does not make the code scale well.
Use
.. code-block:: bash
$ module avail ADF
to see which versions of ADF which are available. Use
.. code-block:: bash
$ module load ADF/<version> # i.e adf2017.108
to get access to any given version of ADF.
.. _Band:
==========================
The periodic DFT code Band
==========================
Information regarding the periodic DFT code Band on Stallo. You may also be interested in this: :doc:`firsttime_band`.
General Information
===================
Description
-----------
BAND is an atomic-orbital based DFT program for periodic systems (crystals, slabs, chains and molecules).
The Amsterdam Density Functional Band-structure program - BAND - can be used for calculations on periodic systems, i.e. polymers, slabs and crystals, and is supplemental to the molecular ADF program for non-periodic systems. It employs density functional theory in the Kohn-Sham approach. BAND is very similar to ADF in the chosen algorithms, although important differences remain.
BAND makes use of atomic orbitals, it can handle elements throughout the periodic table, and has several analysis options available. BAND can use numerical atomic orbitals, so that the core is described very accurately. Because of the numerical orbitals BAND can calculate accurate total energies. Furthermore it can handle basis functions for arbitrary l-values.
Online info from vendor
-----------------------
* Documentation: https://www.scm.com/doc/BAND
Usage
=====
The ADF/BAND suite of software is currently installed as precompiled binaries on Stallo. We install the intel-mpi version, since it has proven to collaborate the better with our mpi setup. We generally advise to run Band on more than one node, unless you do know that your particular problem does not make the code scale well.
Use
.. code-block:: bash
$ module avail ADF
to see which versions of Band which are available. Use
.. code-block:: bash
$ module load ADF/<version> # i.e adf2017.108
to get access to any given version of Band.
.. _adf_advanced:
==============================
Information for advanced users
==============================
Scaling behaviour
-----------------
Since ADF is a very complex code, able to solve a vast range of chemistry problems - giving a unified advice regarding scaling is difficult. We will try to inspect scaling behaviour related to most used areas of application. For a standard geometry optimization, it seems to scale well in the region of 4-6 full nodes (60-100 cores) at least. For linear transit we would currently stay at no more than 4 full nodes or less currently.Unless having tests indicating otherwise, users who want to run large jobs should allocate no more than the prescribed numbers of processors. More information will come.
Memory allocation
-----------------
On Stallo there are 32 GB and 128GB nodes. Pay close attention to memory usage of job on the nodes where you run, and if necessary redistribute the job so that it uses less than all cores on the node until the limit of 32 GB/core. More than that, you will need to ask for access to the highmem-queue. As long as you do not ask for more than 2 GB/core, using the pmem-flag for torque does in principle give no meaning.
How to restart an ADF job
-------------------------
#. In the directory where you started your job, rename or copy the job-output t21 file into $SCM_TMPDIR/TAPE21.
#. In job.inp file, put RESTART TAPE21 just under the comment line.
#. Submit job.inp file as usual.
This might also be automized, we are working on a solution for restart after unexpected downtime.
How to run ADF using fragments
------------------------------
This is a brief introduction to how to create fragments necessary for among other things, BSSE calculations and proper broken symmetry calculations.
**Running with fragments:**
* Download and modify script for fragment create run, e.g. this template: Create.TZP.sh (modify ACCOUNT, add desired atoms and change to desired basis and desired functional)
* Run the create job in the same folder as the one where you want to run your main job(s) (sbatch Create.TZP.sh).
* Put the line cp $init/t21.* . in your ADF run script (in your $HOME/bin directory)
* In job.inp, specify correct file name in FRAGMENT section, e.g. “H t21.H_tzp”. * Submit job.inp as usual.
Running with fragments is only necessary when you want to run a BSSE calculations or manipulate charge and/or atomic mass for a given atom (for instance modeling heavy isotope labelled samples for frequency calculations).
.. _first_time_adf:
==============================
First time you run an ADF job?
==============================
This page contains info aimed at first time
users of ADF on Stallo, but may also be usefull to
more experienced users. Please look carefully through the
provided examples. Also note that the job-script example is rather richly
commented to provide additional and relevant info.
If you want to run this testjob, download the copies of the scripts and put them
into your test job folder (which I assume you have created in advance).
ADF input example
-----------------
.. include:: ../files/caffeine.adf
:literal:
This file can also be downloaded here: :download:`Caffeine input for ADF <../files/caffeine.adf>`.
Place this file in a job folder of choice, say ADFFIRSTJOB in your home directory on Stallo.
then, copy the job-script as seen here:
.. include:: ../files/job_adf.sh
:literal:
(Can also be downloaded from here: :download:`job_adf.sh <../files/job_adf.sh>`)
Place this script in the same folder and type:
.. code-block:: bash
sbatch job_adf.sh
You may now have submitted your first adf job.
These files are also available on Stallo
----------------------------------------
.. code-block:: bash
module load ADF/adf2017.108
cd <to whatever you call your test folder> # for instance ADFFIRSTJOB
cp -R /global/hds/software/notur/apprunex/ADF/* .
To verify that the jobs has worked fine, check the outputfile. If it says EXIT and print Bond Energies at the end of the file, you are likely on the safe side.
Check the Bond Energy in adf_caffeine.out. The value should ideally be close to -156.75317227 eV. When this is the case, you may alter variables in the shell script as much as you like to adapt to your own jobs.
**NB: ADF is installed as precompiled binaries, they come with their own mpi (intel MPI). So if you are not using the provided runscript example and/or loading the newer modules - please make sure that you load an intelmpi module and also preferably a rather recent intel compiler.**
Also note that, on Stallo, we hold the nbo6 plug in license that allows users of adf to produce files that can be read with the nb06 software.
Good luck with your chemistry!
.. _first_time_band:
==============================
First time you run a Band job?
==============================
This page contains info aimed at first time
users of Band on Stallo, but may also be usefull to
more experienced users. Please look carefully through the
provided examples. Also note that the job-script example is rather richly
commented to provide additional and relevant info.
If you want to run this testjob, download the copies of the scripts and put them
into your test job folder (which I assume you have created in advance).
Band input example
------------------
.. include:: ../files/silicone.adf
:literal:
This file can also be downloaded here: :download:`Silicone input for Band <../files/silicone.adf>`.
Place this file in a job folder of choice, say BANDFIRSTJOB in your home directory on Stallo.
then, copy the job-script as seen here:
.. include:: ../files/job_band.sh
:literal:
(Can also be downloaded from here: :download:`job_band.sh <../files/job_band.sh>`)
Place this script in the same folder and type:
.. code-block:: bash
sbatch job_band.sh
You may now have submitted your first adf job.
These files are also available on Stallo
----------------------------------------
.. code-block:: bash
module load ADF/adf2017.108
cd <to whatever you call your test folder> # for instance BANDFIRSTJOB
cp -R /global/hds/software/notur/apprunex/ADF/* .
To verify that the jobs has worked fine, check the outputfile. If it says EXIT and print Energy of Formation at the end of the file, you are likely on the safe side.
#Check the ENERGY OF FORMATION in band_silicone.out. The value should ideally be close to -4.2467 eV. When this is the case, you may alter variables in the shell script as much as you like to adapt to your own jobs.
**NB: The ADF modeling suite is installed as precompiled binaries, they come with their own mpi (intel MPI). So if you are not using the provided runscript example and/or loading the newer modules - please make sure that you load an intelmpi module and also preferably a rather recent intel compiler.**
Good luck with your chemistry!
.. _Dalton:
======
Dalton
======
Related information
===================
.. toctree::
:maxdepth: 1
firsttime_dalton.rst
General information
===================
Description
-----------
Dalton is an ab initio quantum chemistry computer program. It is capable of calculating various molecular properties using the Hartree-Fock, MP2, MCSCF and coupled cluster theories. Dalton also supports density functional theory calculations.
Online information from vendor
------------------------------
* Homepage: http://daltonprogram.org/
* Documentation: http://daltonprogram.org/documentation/
* For download: http://daltonprogram.org/download/
License and access policy
-------------------------
Dalton is licensed under the GPL 2. That basically means that everyone may freely use the program.
Citation
--------
When publishing results obtained with the referred software referred, check the following page to find the correct citation(s): http://daltonprogram.org/citation/
Usage
=====
You load the application by typing:
.. code-block:: bash
module load Dalton/2016-130ffaa0-intel-2017a
For more information on available versions of Dalton, type:
.. code-block:: bash
module avail Dalton
.. _first_time_dalton:
================================
First time you run a Dalton job?
================================
This page contains info aimed at first time
users of Dalton on Stallo. Please look carefully through the
provided examples. Also note that the job-script example is rather richly
commented to provide additional and relevant info.
If you want to run this testjob, copy the input example and the job script example shown below
into your test job folder (which I assume you have created in advance).
Dalton needs one molecule file and one input file. For simplicity, we have tarred them together and you
may download them here: :download:`Dalton files<../files/dalton_example_files.tar.gz>`
Molcas runscrip example
-----------------------
.. include:: ../files/job_dalton.sh
:literal:
You can also download the runscript file here: :download:`Dalton run script<../files/job_dalton.sh>`
The runscript example and the input are also on Stallo
------------------------------------------------------
Type:
.. code-block:: bash
cd <whatever_test_folder_you_have> # For instance testdalton
cp -R /global/hds/software/notur/apprunex/dalton/* .
When you have all the necessary files in the correct folders, submit the job by typing:
.. code-block:: bash
sbatch job_dalton.sh
Good luck.
=========
GaussView
=========
Gaussview is a visualization program that can be used to open Gaussian output
files and checkpoint files (.chk) to display structures, molecular orbitals,
normal modes, etc. You can also set up jobs and submit them directly.
Official documentation: https://gaussian.com/gaussview6/
GaussView on Stallo
-------------------
To load and run GaussView on Stallo, load the relevant Gaussian module, and
then call GaussView::
$ module avail Gaussian
$ module load Gaussian/g16_B.01
$ gview
# Gaussian
{% include list.liquid all=true %}
%chk=example
%mem=500MB
#p b3lyp/cc-pVDZ opt
benzene molecule example
0 1
C 0.00000 1.40272 0.00000
C 0.00000 -1.40272 0.00000
C 1.21479 0.70136 0.00000
C 1.21479 -0.70136 0.00000
C -1.21479 0.70136 0.00000
C -1.21479 -0.70136 0.00000
H 0.00000 2.49029 0.00000
H 0.00000 -2.49029 0.00000
H 2.15666 1.24515 0.00000
H 2.15666 -1.24515 0.00000
H -2.15666 1.24515 0.00000
H -2.15666 -1.24515 0.00000
#!/bin/bash -l
#SBATCH --account=nn....k
#SBATCH --job-name=example
#SBATCH --output=example.log
# Stallo nodes have either 16 or 20 cores: 80 is a multiple of both
#SBATCH --ntasks=80
# make sure no other job runs on the same node
#SBATCH --exclusive
# syntax is DD-HH:MM:SS
#SBATCH --time=00-00:30:00
# allocating 30 GB on a node and leaving a bit over 2 GB for the system
#SBATCH --mem-per-cpu=1500MB
################################################################################
# no bash commands above this line
# all sbatch directives need to be placed before the first bash command
# name of the input file without the .com extention
input=example
# flush the environment for unwanted settings
module --quiet purge
# load default program system settings
module load Gaussian/g16_B.01
# set the heap size for the job to 20 GB
export GAUSS_LFLAGS2="--LindaOptions -s 20000000"
# split large temporary files into smaller parts
lfs setstripe –c 8 .
# create scratch space for the job
export GAUSS_SCRDIR=/global/work/${USER}/${SLURM_JOB_ID}
tempdir=${GAUSS_SCRDIR}
mkdir -p ${tempdir}
# copy input and checkpoint file to scratch directory
cp ${SLURM_SUBMIT_DIR}/${input}.com ${tempdir}
if [ -f "${SLURM_SUBMIT_DIR}/${input}.chk" ]; then
cp ${SLURM_SUBMIT_DIR}/${input}.chk ${tempdir}
fi
# run the code
cd ${tempdir}
time g16.ib ${input}.com > ${input}.out
# copy output and checkpoint file back
cp ${input}.out ${SLURM_SUBMIT_DIR}
cp ${input}.chk ${SLURM_SUBMIT_DIR}
# remove the scratch directory after the job is done
# cd ${SLURM_SUBMIT_DIR}
# rm -rf /global/work/${USER}/${SLURM_JOB_ID}
exit 0
.. _gaussian_on_stallo:
==========================
Memory and number of cores
==========================
This page contains info about special features related to
the Gaussian install made on Stallo, but also general issues
related to Gaussian only vaguely documented elsewhere.
Choosing the right version
--------------------------
To see which versions of Gaussien are available, use::
$ module avail Gaussian/
To load a specific version of Gaussian, use for instance::
$ module load Gaussian/g16_B.01
Gaussian over Infiniband
------------------------
First note that the Gaussian installation on Stallo is the Linda parallel
version, so it scales somewhat initially. On top of this, Gaussian is installed
with a little trick, where the loading of the executable is intercepted before
launched, and an alternative socket library is loaded. This enables the
running Gaussian natively on the Infiniband network giving us
two advantages:
* The parallel fraction of the code scales to more cores.
* The shared memory performance is significantly enhanced (small scale performance).
But since we do this trick, we are greatly depending on altering the specific
node address into the input file: To run gaussian in parallel requires the
additional keywords ``%LindaWorkers`` and ``%NProcshared`` in the ``Link 0`` part of the
input file. This is taken care of by a wrapper script around the
original binary in each individual version folder. This
is also commented in the job script example. Please use
our example when when submitting jobs.
We have also taken care of the rsh/ssh setup in our installation procedure, to
avoid .tsnet.config dependency for users.
Parallel scaling
----------------
Gaussian is a rather large program system with a range of different binaries,
and users need to verify whether the functionality they use is parallelized and
how it scales.
Due to the preload Infiniband trick, we have a somewhat more generous policy
when it comes to allocating cores/nodes to Gaussian jobs but before computing a
table of different functionals and molecules, we strongly advice users to first
study the scaling of the code for a representative system.
Please do not reuse scripts inherited from others without studying the
performance and scaling of your job. If you need assistance with this, please
contact the user support.
We do advice people to use up to 256 cores (``--tasks``). We have observed
acceptable scaling of the current Gaussian install beyond 16 nodes for the jobs
that do scale outside of one node (i.e. the binaries in the $gXXroot/linda-exe
folder). Linda networking overhead seems to hit hard around this amount of
cores, causing us to be somewhat reluctant to advice going beyond 256 cores.
Since we have two different architectures with two different core counts on
Stallo, the ``--exclusive`` flag is important to ensure that the distribution
of jobs across the whole system are done in a rather flexible and painless way.
Memory allocation
-----------------
Gaussian takes care of memory allocation internally.
That means that if the submitted job needs more memory per core than what is in
average available on the node, it will automatically scale down the number of
cores to mirror the need. This also means that you always should ask for full
nodes when submitting Gaussian jobs on Stallo! This is taken care of by the
``--exclusive`` flag and commented in the job script example.
The ``%mem`` allocation of memory in the Gaussian input file means two things:
* In general it means memory/node – for share between ``nprocshared``, and
additional to the memory allocated per process. This is also documented by
Gaussian.
* For the main process/node it also represents the network
buffer allocated by Linda since the main Gaussian process takes a part
and Linda communication process takes a part equally sized – thus you should
never ask for more than half of the physical memory on the nodes, unless they
have swap space available - which you never should assume.
The general ``%mem`` limit will always be half of the physical memory
pool given in MB instead of GB - 16000MB instead of 16GB since this leaves a
small part for the system. That is why we would actually advise to use 15GB as
maximum ``%mem`` size.
Large temporary outputs on Stallo
---------------------------------
As commented here :doc:`/storage/storage` there is an issue related to very
large temporary files on Stallo. Please read up on it at act accordingly. This
issue is also commented in the job script example.
====================
Gaussian job example
====================
To run this example create a directory, step into it, create the input file and submit the script with::
$ sbatch gaussian-runscript.sh
Input example
-------------
Direct download link: :download:`input example <example.com>`
.. literalinclude:: example.com
:language: none
Runscript example
-----------------
Direct download link: :download:`runscript example <gaussian-runscript.sh>`
.. literalinclude:: gaussian-runscript.sh
:language: bash
======================
Gaussian and GaussView
======================
.. toctree::
:maxdepth: 1
job-example.rst
gaussian_on_stallo.rst
GaussView.rst
`Gaussian <http://gaussian.com>`_ is a popular computational chemistry program.
Official documentation: http://gaussian.com/man
License and access
------------------
The license is commercial/proprietary and constitutes of 4 site licenses for the 4 current host
institutions of NOTUR installations; NTNU, UiB, UiO, UiT. Only
persons from one of these institutions have access to the Gaussian Software
system installed on Stallo. Note that users that do not come from one of the
above mentioned institutions still may be granted access, but they need to
document access to a valid license for the version in question first.
* To get access to the code, you need to be in the ``gaussian`` group of users.
* To be in the ``gaussian`` group of users, you need be qualified for it - see above.
Citation
--------
For the recommended citation, please consult http://gaussian.com/citation/.
.. _Molcas:
======
Molcas
======
Related information
===================
.. toctree::
:maxdepth: 1
firsttime_molcas.rst
General information
===================
Description
-----------
Molcas is an ab initio quantum chemistry software package developed by scientists to be used by scientists. The basic philosophy is is to be able to treat general electronic structures for molecules consisting of atoms from most of the periodic table. As such, the primary focus of the package is on multiconfigurational methods with applications typically connected to the treatment of highly degenerate states.
Key aspects of Molcas:
* SCF/DFT, CASSCF/RASSCF, CASPT2/RASPT2
* Fast, accurate, and robust code
Online information from vendor
------------------------------
* Homepage: http://molcas.org/
* Documentation: http://molcas.org/documentation/manual/
* For download: http://molcas.org/download.html
License and access policy
-------------------------
Molcas comes with a non-free computer center license, though it is free for academic employees from the Nordic region.
The HPC group @ UiT have an agreement with Molcas as a part of the old application management project, that we do make a central
install of the code - but users who wants access needs to document that they have made an agreement with Molcas. This proof of agreement should
then be mailed (e or non-e) to support@metacenter.no to be granted membership of the molcas group on Stallo.
For the future, we are considering establishing the same policy as for :ref:`ORCA`, especially since the vendor has done a substantial job at decomplicating
the install procedure using CMake.
Citation
--------
When publishing results obtained with the referred software referred, please do check the developers web page in order to find the correct citation(s).
Usage
=====
You load the application by typing:
.. code-block:: bash
module load Molcas/molcas82-intel-2015a
For more information on available versions of Molcas, type:
.. code-block:: bash
module avail Molcas
.. _first_time_molcas:
================================
First time you run a Molcas job?
================================
This page contains info aimed at first time
users of Molcas on Stallo. Please look carefully through the
provided examples. Also note that the job-script example is rather richly
commented to provide additional and relevant info.
If you want to run this testjob, copy the input example and the job script example shown below
into your test job folder (which I assume you have created in advance).
Molcas comes with one input file and one geometry file. For simplicity, we have tarred them together and you
may download them here: :download:`Ethane-input<../files/C2H6.tar.gz>`
Molcas runscrip example
-----------------------
.. include:: ../files/job_molcas.sh
:literal:
**NB: Note that some of Molcas´s various modules are not mpi scalable. Consult the vendor-provided manual for scaling issues. Our example is based on a single-node all cores job.**
You can also download the runscript file here: :download:`Molcas run script<../files/job_molcas.sh>`
The runscript example and the input are also on Stallo
------------------------------------------------------
Type:
.. code-block:: bash
module load Molcas/molcas82-intel-2015a
cd <whereevertestfolderyouhave> # For instance testmolcas
cp -R /global/hds/software/notur/apprunex/Molcas/* .
When you have all the necessary files in the correct folders, submit the job by typing:
.. code-block:: bash
sbatch job_molcas.sh
To verify that nothing has gone wrong, check the status file. If it concludes "happy landing" you are on the safe side, most probably. Good luck.
.. _ORCA:
====
ORCA
====
Some users have requested an installation of the ORCA quantum chemistry program
on Stallo. Unfortunately, ORCA does not come with a computer center license,
which basically means that we cannot easily install it globally on Stallo.
However, the code is free of charge and available for single users or at
research group level, provided compliance with the *ORCA End User License
Agreement*. This means that interested users can download the precompiled
executables from the ORCA web page themselves, and run the code on Stallo.
* Homepage: https://orcaforum.kofo.mpg.de
Installation
============
The following has been tested for version 4.2.1 on Stallo:
1) Go to the ORCA forum page and register as user
2) Wait for activation e-mail (beware of spam filters...)
3) Log in and click on **Downloads** in the top menu
4) Download the Linux x86-64 complete archive and place it somewhere in your home folder
5) Extract the tarball: ``$ tar xf orca_4_2_1_linux_x86-64_openmpi314.tar.xz``
6) The ``orca`` executable is located in the extracted archive and should work out of the box
Usage
=====
In order to run ORCA in parallel the OpenMPI module must be loaded, and it is
important to use the **full path** to the executable
.. code-block:: bash
$ module load OpenMPI/3.1.3-GCC-8.2.0-2.31.1
$ /full/path/to/orca orca.inp >orca.out
Note that the calculation should not be launched by ``mpirun``, but you should use
the input keyword ``nprocs``. A very simple example input file is provided below.
Please refer to the ORCA manual (can be downloaded from the same download page
on the ORCA forum) for details on how to run the code for your application.
ORCA input example
------------------
.. include:: ../files/orca.inp
:literal:
# ORCA
{% include list.liquid all=true %}
# Chemistry
{% include list.liquid all=true %}
.. _Schrodinger:
==========================
Schrödinger Product Suites
==========================
A description of the Schrödinger product suites that are available for norwegian academic users
Related information
-------------------
.. toctree:: :maxdepth: 1
:titlesonly:
run_schrod
install_schrod
General Information
===================
NOTUR is, together with the user community, funding a national license of Schrödinger´s Small Molecule Drug Discovery Suite and PyMol. This implies that users are allowed to download and install their own copies of the software, and using the license server for providing license tokens. More about that in another document: :ref:`install_schrodinger`.
Online info from vendor
-----------------------
* Homepage: https://www.schrodinger.com/
About the suites:
* Small Molecule Drug Discovery Suite: https://www.schrodinger.com/suites/small-molecule-drug-discovery-suite
* PyMol: https://www.schrodinger.com/pymol
For documentation, you need to create an account (free) and log in. Documentation is also available in the software home folder:
.. code-block:: bash
$ module load Schrodinger/{version}
$ cd $SCRODINGER/docs
Remeber to point a pdf-reader or html reader to the documentation if you plan to read it on Stallo.
Support people in NOTUR do not provide trouble shooting guides anymore. For Schrödinger suites, the vendor company is giving good support \
on a system level. Problems related to running schrodinger on a NOTUR facility should be adressed to us.
Citation
--------
When publishing results obtained with the referred software, please do check the developers web page in order to find the correct citation(s).
License and access policy
-------------------------
The licenses of Schrödinger Product Suites are commercial.
NOTUR is, together with the user community, funding a national license of Schrödinger´s Small Molecule Drug Discovery Suite and PyMol. The licenses are administered by a license server based on flexlm. The adress for this license setup is available upon request to support@metacenter.no.
The outtake of license tokens is monitored on regular basis, and we try to catch those who seems to use the suite for regular scientific production and ask them to contribute financially to the overall deal. So far, this policy have worked fairly well.
All members of the co-funding research groups have unlimited access to the Schrödinger´s Small Molecule Drug Discovery Suite and PyMol license tokens.
`Please note that this is an academic type license; meaning that research institutes not being part of Norwegian Universities must provide their own license to be able to use and publish results obtained with this code on NOTUR installlations.`
Usage
=====
The most commot usage of schrodinger on Stallo is through the Maestro gui. Log in with x11-tunneling enabled, or through the web-interface http://stallo-gui.uit.no/.
Load the module of choice:
Use
.. code-block:: bash
$ module avail Schrodinger
to see which versions of Schrodinger are available.
Use
.. code-block:: bash
$ module load Schrodinger/<version> # i.e 2017-2-intel-2016a
to get access to Schrodinger. The batch resource allocation system is integrated with the gui through a schrodinger.hosts file which is centrally administered.
Please do not hold a local copy!
For examples on how to submit Schrodinger jobs on Stallo, look here :ref:`run_schrodinger`.
Finding available licenses
==========================
This should in principle be obsolete for users, since we are promised unlimited licenses in the national system. But still, for the curious souls:
If you want to know about avaible licenses; do the following
(after loading the schrodinger module)
.. code-block:: bash
$ licadmin STAT
This command will give you information about license status for the national Schrodinger suite licenses.
Access to PyMOL
===============
For users that wants/needs access to PyMOL, please fill out the following form: https://skjema.uio.no/pymol-access.
**Please not that this strategy replaces old habits of sending personal emails in this regard.**
.. _install_schrodinger:
Installing Schrodinger on your local client machine
===================================================
This page contains info about how to download and install the Schrodinger suite on your local client machine.
Getting an account on www.schrodinger.com
-----------------------------------------
You need to create an account (free) and log in with the password you create. Then you download the relevant software for your machine and follow install procedures that comes with the software.
Getting access to the national license
--------------------------------------
Since we have a joint funding of license tokens between user community and national bodies, members of the funding research groups are entiteled to get access to the license from their client machines. The easiest way to do this is by downloading the following license file:
.. include:: ../files/License_Schrodinger.txt
:literal:
(Can also be downloaded from here: :download:`License_Schrodinger.txt <../files/License_Schrodinger.txt>`)
This should be sufficient to make your personal copy of Schrodinger Small Molecule Drug Discovery suite work on your machine.
**Note: It is a known bug that the configure setup of Schrodinger reports no license found even if the program suite works well. Please ignore.**
.. _run_schrodinger:
Running Schrodinger jobs on Stallo
==================================
This page contains info about how to submit Schrodinger jobs on Stallo, based
on two sets of examples - one for docking and one for molecular dynamics -
provided to us by users.
Pay enhanced attention to the `ssh -Y c1-2` in examples below; this represents the adviced behaviour on how to run jobs on Stallo for your benefit solely!
A more thorough explanation to this, is that maestro starts a distribution and surveilance process that creates the jobs that enters the shared resources allocation (aka batch) system. If this process dies, the underlying jobs dies disregarding their computational status. This could have been solved by just running this on the login node, but imagine how it would have been with 1000 simultanious users sharing two login nodes and 50 of those ran 20-40 simultaneous perl and python processes each on the login nodes. So, please do as told.
Submitting jobs through the maestro interface
---------------------------------------------
Log in to Stallo with either x11-tunnelling enabled or through stallo-gui.uit.no.
To get download the jobscript example(s), type the following:Direct log in via ssh:
.. code-block:: bash
$ module load Schrodinger/2017-2-intel-2016a
$ cp -r /global/hds/software/notur/apprunex/Schrodinger/* ~
# Be sure this do not overwrite any folder or info you may want to keep in your home.
Note: This suite is quite extensive in features, and we generally advice you to either join tutorial courses, talk to experts in your proximity or read the vendor-provided documentation if you have absolutely no knowledge about how to use this suite of software. Here we only provide a couple of rough startup examples to get you up running.
Then;
**If you want to test the molecular dynamics package**
Do the following:
.. code-block:: bash
$ ssh -Y c1-2
$ module load Schrodinger/2017-2-intel-2016a
$ cd example_md
$ maestro -SGL
and start a Molecular Dynamics task from the Tasks menu bar. Load model system from file, choose desmond_md_example.cms
Set all settings to your satisfaction.
Open the job settings; you should only be presented the following options
#. localhost (only for job-setups)
#. batch-normal (2 days (=48hrs) of walltime/1000 cpus)
#. batch-long (21 days (504 hrs) of walltime/1000cpus)
Make sure that only one option is highlighted (only one of the batch-XXX´s). Press the run button and supervise running.
(This test case is provided us by Vladimir Pomogaev.)
**If you want to test the docking package**
Do the following:
.. code-block:: bash
$ ssh -Y c1-2
$ module load Schrodinger/2017-2-intel-2016a
$ cd example_docking
$ maestro -SGL
and start a Docking task from the Tasks menu bar.
Choose abl1-kinase-t316i_glide-grid.zip as grid
Use ligands from file; if you want to run a very short test - choose abl1-kinase-t316i_minimized.mae, for a longer test browse for SD and choose Asinex-50000-3D-minimized.sdf set of ligands. Set all settings to your satisfaction. For job settings, see example above.
(This test case is provided us by Bjorn Dalhus.)
If you want to run vsw routines from the command line
-----------------------------------------------------
The Schrodinger suite is shipped with scripts that connects the software installation with the system batch resource allocation setup, making it possible to submit glide jobs from the linux command line.
Examples of valid command line submissions using the vsw-tool on Stallo:
.. code-block:: bash
vsw *.inp -DRIVERHOST batch-normal -host_glide batch-normal:100 -NJOBS 1 –adjust
If you are worried that you will not refind your files for emergency startup:
.. code-block::bash
vsw *.inp -DRIVERHOST batch-normal -host_glide batch-normal:100 -NJOBS 1 -LOCAL
vsw *.inp -DRIVERHOST batch-normal -host_glide batch-normal:100 -NJOBS 1 -SAVE
**Note the following details:**
#. When the Schrodinger module is loaded on Stallo, the Schrodinger software folder is set in path, making $SCHRODINGER unecessary.
#. The Schrodinger setup on Stallo writes to the scratch file system by default, potentially making both the -LOCAL and the -SAVE flags uneccesary.
#. We do not recommend the -REMOTEDRIVER flag due to the risk of loosing jobs related to the admin process running out allocated time.
.. _TURBOMOLE:
============================
The TURBOMOLE program system
============================
Information regarding the quantum chemistry program system TURBOMOLE.
Related information
===================
.. toctree::
:maxdepth: 1
run_turbo.rst
General Information
===================
Description
-----------
TURBOMOLE is a quantum chemical program package, initially developed in the
group of Prof. Dr. Reinhart Ahlrichs at the University of Karlsruhe and at the
Forschungszentrum Karlsruhe. In 2007, the TURBOMOLE GmbH (Ltd) was founded by
the main developers of the program, and the company took over the responsibility
for the coordination of the scientific development of the program, to which it
holds all copy and intellectual property rights.
Online info from vendor
-----------------------
* Homepage: http://www.turbomole-gmbh.com
* Documentation: http://www.turbomole-gmbh.com/turbomole-manuals.html
License information
-------------------
Sigma2 holds a national TURBOMOLE license covering all computing centers in Norway.
Citation
--------
When publishing results obtained with this software, please check with the
developers web page in order to find the correct citation(s).
TURBOMOLE on Stallo
===================
Usage
-----
To see which versions of TURBOMOLE are available
.. code-block:: bash
$ module avail turbomole
Note that the latest (as of May 2018) version 7.2 comes in three different
parallelization variants
* TURBOMOLE/7.2
* TURBOMOLE/7.2.mpi
* TURBOMOLE/7.2.smp
where the latter two correspond to the old separation between distributed memory
(mpi) and shared memory (smp) implementations that some users may know from
previous versions of the program. We recommend, however, to use the new hybrid
parallelization scheme (new in v7.2) provided by the ``TURBOMOLE/7.2`` module.
In order to run efficiently in hybrid parallel the SLURM setup must be correct
by specifying CPUs rather than tasks per node
.. code-block:: bash
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=20
i.e. do NOT specify ``--ntasks=40``. Also, some environment variables must be
set in the run script
.. code-block:: bash
export TURBOTMPDIR=/global/work/${USER}/${SLURM_JOBID}
export PARNODES=${SLURM_NTASKS}
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}
See the TURBOMOLE example for more details.
NOTE: The new hybrid program (v7.2) will launch a separate server process in
addition to the requested parallel processes, so if you run e.g. on 20 cores
each on two nodes you will find one process consuming up to 2000% CPU on each
node plus a single process on the first node consuming very little CPU.
NOTE: We have found that the hybrid program (v7.2) runs efficiently only if
*full* nodes are used, e.i. you should always ask for ``--cpus-per-node=20``,
and scale the number of nodes by the size of the calculation.
.. _turbomole_example:
==========================
TURBOMOL runscript example
==========================
.. include:: ../files/job_turbo.sh
:literal:
TURBOMOL example on Stallo
--------------------------
The above runscript can be found on Stallo along with the necessary input files
to perform a simple DFT calculation. To get access to the examples directory,
load the ``notur`` module
.. code-block:: bash
$ module load notur
$ cp -r /global/hds/software/notur/apprunex/TURBOMOLE .
From this directory you can submit the example script (you need to specify a
valid account in the script first)
.. code-block:: bash
$ sbatch job_turbo.sh
The calculation should take less than 2 min, but may spend much more than this
in the queue. Verify that the output file ``dscf.out`` ends with
.. code-block:: bash
**** dscf : all done ****
Please refer to the TURBOMOLE manual for how to setup different types of
calculations.
# VASP
{% include list.liquid all=true %}
.. _VASP:
==========================================
VASP (Vienna Ab initio Simulation Package)
==========================================
Information regarding the Vienna Ab initio Simulation Package (VASP) installed on Stallo.
Related information
===================
.. toctree::
:maxdepth: 1
firsttime_vasp.rst
vasp_on_stallo.rst
General information
===================
Description
-----------
VASP is a complex package for performing ab-initio quantum-mechanical molecular dynamics (MD) simulations using pseudopotentials or the projector-augmented wave method and a plane wave basis set. The approach implemented in VASP is based on the (finite-temperature) local-density approximation with the free energy as variational quantity and an exact evaluation of the instantaneous electronic ground state at each MD time step.
VASP uses efficient matrix diagonalisation schemes and an efficient Pulay/Broyden charge density mixing. These techniques avoid all problems possibly occurring in the original Car-Parrinello method, which is based on the simultaneous integration of electronic and ionic equations of motion.
The interaction between ions and electrons is described by ultra-soft Vanderbilt pseudopotentials (US-PP) or by the projector-augmented wave (PAW) method. US-PP (and the PAW method) allow for a considerable reduction of the number of plane-waves per atom for transition metals and first row elements. Forces and the full stress tensor can be calculated with VASP and used to relax atoms into their instantaneous ground-state.
There are various type plug ins and added functionality to VASP. On Stallo, we have added support for the Texas transition state tools (`vTST <http://theory.cm.utexas.edu/vtsttools/>`_), the `VASPsol <http://vaspsol.mse.ufl.edu>`_ implicit solvation model made by Hennig and Mathew from University of Florida and the `Bayesian Error Estimation Functionals <https://suncat.stanford.edu/about/facilities/software>`_ from the SUNCAT center, Standford.
Online information from vendor
------------------------------
* Homepage: https://www.vasp.at
* Documentation: https://cms.mpi.univie.ac.at/wiki/index.php/The_VASP_Manual
License and access policy
-------------------------
The VASP license is proprietary and commercial. It is based on group license policy, and for NOTUR systems VASP packages falls in the "bring your own license" category. See :ref:`About_licenses`.
The Vasp program is not distributed via site licences. However, HPC-staff in NOTUR have access to the VASP code to be able to support any research groups that have a valid VASP license.
VASP is a commercial software package that requires a license for all who wants use it. To run VASP:
#. Your group must have a valid licence. To acquire a licence, please consult this link: https://www.vasp.at/index.php/faqs/71-how-can-i-purchase-a-vasp-license.
#. We need to get a confirmation from a VASP representative to confirm that you have access to the license. Your group representative needs to contact Dr. Doris Vogtenhuber (doris.vogtenhuber@univie.ac.at) from the VASP team and ask her to send a confirmation email to us (support@metacenter.no) to confirm that you have a valid licence. Once we get a confirmation email we will grant you access to run VASP.
The support people in NOTUR, do not provide trouble shooting guides anymore, due to a national agreement that it is better for the community as a whole to add to the community info/knowledge pool where such is made available. Also, HPC staff from UiT does not provide any support to VASP 4 anymore, basically due to age of the code.
Citation
---------
When publishing results obtained with the referred software referred, please do check the developers web page in order to find the correct citation(s).
Usage
=====
You load the application by typing:
.. code-block:: bash
$ module load VASP
This command will give you the default version.
For more information on available versions, type:
.. code-block:: bash
$ module avail VASP
Experiencewise, VASP is a rather memory intensive code. Users are advised to read up on the general job script example(s) for SLURM and Stallo, and also how to specify memory in `SLURM <https://slurm.schedmd.com>`_.
About the VASP version(s) installed on Stallo
---------------------------------------------
Since the installation we made on Stallo is rather taylor made, we have gathered information about this here: :doc:`vasp_on_stallo`.
Here we also adress compilation relevant information for those who would like to do it themselves.
.. _first_time_vasp:
==============================
First time you run a VASP job?
==============================
This page contains info aimed at first time
users of VASP on Stallo, but may also be usefull to
more experienced users. Please look carefully through the
provided examples. Also note that the job-script example is rather richly
commented to provide additional and relevant info.
If you want to run this testjob, download the copies of the scripts and put them
into your test job folder (which I assume you have created in advance).
VASP input example
------------------
Download the tarred job :download:`CeO2job-files <../files/CeO2.tar.gz>`.
move this file to your test job folder on Stallo and type
.. code-block:: bash
tar -zxf CeO2.tar.gz
Then; download the job-script as seen here:
.. include:: ../files/job_vasp.sh
:literal:
Download it here :download:`job_vasp.sh <../files/job_vasp.sh>`.
These files are also available on Stallo
----------------------------------------
.. code-block:: bash
module load VASP/5.4.1.plain-intel-2016a
cd <to whatever you call your test folder> # for instance testvasp
cp -R /global/hds/software/notur/apprunex/VASP/* .
sbatch job_vasp.sh
and you are up running. Happy hunting.
.. _vasp_on_stallo:
================================
About the VASP install on Stallo
================================
This page contains information related to the installation of VASP on Stallo. Some of this is relevant also for self-compilation of the code, for those who want to give this a try.
VASP on Stallo
--------------
Note that the VASP installation on Stallo mainly follows the standard syntax introduced by the vASP team with their new installation scheme. Based on their system, we have added two binaries - as commented under.
If you do
.. code-block:: bash
module avail VASP
on Stallo, you will notice that for version 5.4.1 and onwards, there is a dramatic increase in the available binaries. And this might appear confusing.
First; All versions of VASP is compiled with the support for maximally-localised Wannier functions and the Wannier90 program and also the MPI flag in FPP (-DMPI)
Second; Each release of VASP is compiled in two different versions; "tooled" and "plain".
* VASP/x.y.z.tooled is a version where all necessary support for Texas transition state tools (vTST) and the explicit solvation model (VSPsol) and BEEF is added.
* VASP/x.y.z.plain is the version without this support/ additions. (Relatively unmodified source).
The reason for this is that we are uncertain of effects on the tooles on calculated numbers, due to reproducibility, we have chosen to hold the different versions separate.
Then it starts getting interesting: For each version, there are 15 different binaries - consisting of 3 groups of 5.
* unmodified group; binaries without any additional name to them; vasp_std
* noshear: vasp_std_noshear
* abfix: vasp_std_abfix
The unmodified group is compiled on basis on the unmodified set of fortran files that comes with the code.
The noshear version is compiled with a modified version of the file constr_cell_relax.F file where shear forces are not calculated.
The abfix version is compiled with a modified version of the constr_cell_relax.F file where two lattice vectors are fixed.
There are 5 different binaries in each group, all compiled with the same FPP settings (mentioned below):
* vasp_std
* vasp_gam
* vasp_ncl
These are familiar from the standard build system that is provided with the code. In addtion to these, we have
* vasp_tbdyn
* vasp_gam_tbdyn
FPP settings for each binary
----------------------------
#. vasp_std is compiled with the following additional FPP flag(s): -DNGZhalf
#. vasp_gam is compiled with the following additional FPP flag(s): -DNGZhalf -DwNGZhalf
#. vasp_ncl is compiled with the following additional FPP flag(s): no modifcations in FPP settings
#. vasp_tbdyn is compiled with the following additional FPP flag(s): -DNGZhalf -Dtbdyn
#. vasp_gam_tbdyn is compiled with the following additional FPP flag(s): -DNGZhalf -DwNGZhalf -Dtbdyn
We would be happy to provide a copy of our build scripts (patches) upon request.
About memory allocation for VASP
--------------------------------
VASP is known to be potentially memory demanding. Quite often, you might experience to use less than the full number of cores on the node, but still all of the memory.
For core-count, node-count and amounts of memory on Stallo, see :ref:`about_stallo`.
There are two important considerations to make:
First: Make sure that you are using the SBATCH --exclusive flag in your run script.
Second: How to allocate all the memory:
.. literalinclude:: ../../../jobs/files/slurm-big-memory.sh
:language: bash
:linenos:
*-------------------------------------------------------------------------------
&GATEWAY
coord
C2H6.xyz
basis
ANO-S-VDZ
group
y xz
*-------------------------------------------------------------------------------
&SEWARD
Title
Ethane DFT test job
*-------------------------------------------------------------------------------
>>foreach DFT in (BLYP, B3LYP )
&SCF ; KSDFT = $DFT
>>enddo
*-------------------------------------------------------------------------------
8
$Revision: 7.8 $ bohr
C -0.2003259123 0.0000000000 -1.4269296645
C 0.2003259123 0.0000000000 1.4269296645
H -2.2104549009 0.0000000000 -1.9008373239
H 2.2104549009 0.0000000000 1.9008373239
H 0.6485856097 -1.6668156152 -2.3021358629
H 0.6485856097 1.6668156152 -2.3021358629
H -0.6485856097 -1.6668156152 2.3021358629
H -0.6485856097 1.6668156152 2.3021358629
SERVER Lisens-Schrodinger.uit.no 0050569f5819 27008
VENDOR SCHROD PORT=53000
USE_SERVER
title Caffeine for scale testing
integration 6.0
units
length angstrom
end
symmetry Nosym
atoms cartesian
C 1.179579 0.000000 -0.825950
C 2.359623 0.000000 0.016662
C 2.346242 0.000000 1.466600
N 1.092573 0.000000 2.175061
C 0.000000 0.000000 1.440000
N 0.000000 0.000000 0.000000
N 3.391185 0.000000 1.965657
C 4.217536 0.000000 1.154419
N 3.831536 0.000000 0.062646
C 4.765176 -0.384157 -0.964164
C 1.058378 -0.322767 3.578004
O -1.345306 0.000000 1.827493
C -1.260613 -0.337780 -0.608570
O 1.192499 0.000000 -2.225890
H -1.997518 -0.535233 0.168543
H -1.598963 0.492090 -1.227242
H -1.138698 -1.225644 -1.227242
H 0.031688 -0.271264 3.937417
H 1.445570 -1.329432 3.728432
H 1.672014 0.388303 4.129141
H 4.218933 -0.700744 -1.851470
H 5.400826 0.464419 -1.212737
H 5.381834 -1.206664 -0.604809
H 5.288201 0.000000 1.353412
end
BASIS
Type TZ2P
End
geometry
end
scf
end
unrestricted
charge 0 0
xc
gga PW91
end
noprint frag sfo
endinput
#!/bin/bash -l
################### ADF Job Batch Script Example ###################
# Section for defining queue-system variables:
#-------------------------------------
# This script asks for a given set of cores nodes and cores. Stallo has got 16 or 20 cores/node,
# asking for something that adds up to both is our general recommodation (80, 160 etc), you would
# then need to use --ntasks instead of --nodes and --ntasks-per-node. (replace both).
# Runtime for this job is 59 minutes; syntax is hh:mm:ss.
# Memory is set to the maximum advised for a full node, 1500MB/core - giving a total
# of 30000MB/node and leaving some for the system to use. Memory
# can be specified pr core, virtual or total pr job (be carefull).
#-------------------------------------
# SLURM-section
#SBATCH --job-name=adf_runex
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=20
#SBATCH --time=00:59:00
#SBATCH --mem-per-cpu=1500MB # Be ware of memory needs, might be a lot higher if you are running Zora basis, for example.
#SBATCH --output=adf_runex.log
#SBATCH --mail-type=ALL
#SBATCH --exclusive
#SBATCH --partition=multinode
######################################
# Section for defining job variables and settings:
input=caffeine # Name of input without extention
ext=adf # We use the same naming scheme as the software default, also for extention
cores=$SLURM_NTASKS # Number of cores potentially used by mpi engine in submit procedure
# We load all the default program system settings with module load:
module --quiet purge
module load ADF/adf2017.108
# Now we create working directory and temporary scratch for the job(s):
# Necessary variables are defined in the notur and the software modules.
export SCM_TMPDIR=/global/work/$USER/$SLURM_JOBID
mkdir -p $SCM_TMPDIR
# Preparing and moving inputfiles to tmp:
submitdir=$SLURM_SUBMIT_DIR
cp ${input}.${ext} $SCM_TMPDIR
cd $SCM_TMPDIR
# In case necessary, set SCM_IOBUFFERSIZE
#export SCM_IOBUFFERSIZE=1024 # Or higher if necessary.
######################################
# Section for running the program and cleaning up:
# Running the program:
time adf -n $cores < ${input}.${ext} > adf_$input.out
# Cleaning up and moving files back to home/submitdir:
# Make sure to move all essential files specific for the given job/software.
cp adf_$input.out $submitdir/adf_$input.out
cp TAPE21 $submitdir/$input.t21
# To zip some of the output might be a good idea!
#gzip $input.t21
#mv $input.t21.gz $submitdir/
# Investigate potentially other files to keep:
echo $(pwd)
echo $(ls -ltr)
# ALWAYS clean up after yourself. Please do uncomment the following line
#cd $submitdir
#rm -r $tempdir/*
echo "Job finished at"
date
################### Job Ended ###################
exit 0
#!/bin/bash -l
################### ADF Job Batch Script Example ###################
# Section for defining queue-system variables:
#-------------------------------------
# This script asks for a given set of cores nodes and cores. Stallo has got 16 or 20 cores/node,
# asking for something that adds up to both is our general recommodation (80, 160 etc), you would
# then need to use --ntasks instead of --nodes and --ntasks-per-node. (replace both).
# Runtime for this job is 59 minutes; syntax is hh:mm:ss.
# Memory is set to the maximum advised for a full node,
# of 30000MB/node and leaving some for the system to use. Memory
# can be specified pr core, virtual or total pr job (be carefull).
#-------------------------------------
# SLURM-section
#SBATCH --job-name=band_runex
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=20
#SBATCH --time=00:59:00
#SBATCH --mem-per-cpu=1500MB # Be ware of memory needs!
#SBATCH --output=band_runex.log
#SBATCH --mail-type=ALL
#SBATCH --exclusive
#SBATCH --partition=multinode
######################################
# Section for defining job variables and settings:
input=silicone # Name of input without extention
ext=adf # We use the same naming scheme as the software default, also for extention
cores=40 # Number of cores potentially used by mpi engine in submit procedure
# We load all the default program system settings with module load:
module --quiet purge
module load ADF/adf2017.108
# Now we create working directory and temporary scratch for the job(s):
# Necessary variables are defined in the notur and the software modules.
export SCM_TMPDIR=/global/work/$USER/$SLURM_JOBID
mkdir -p $SCM_TMPDIR
# Preparing and moving inputfiles to tmp:
submitdir=$SLURM_SUBMIT_DIR
tempdir=$SCM_TMPDIR
cd $submitdir
cp ${input}.${ext} $tempdir
cd $tempdir
# In case necessary, set SCM_IOBUFFERSIZE
#export SCM_IOBUFFERSIZE=1024 # Or higher if necessary.
######################################
# Section for running the program and cleaning up:
# Running the program:
time band -n $cores < ${input}.${ext} > band_$input.out
# Cleaning up and moving files back to home/submitdir:
# Make sure to move all essential files specific for the given job/software.
cp band_$input.out $submitdir/band_$input.out
cp TAPE21 $submitdir/$input.t21
# To zip some of the output might be a good idea!
#gzip $input.t21
#mv $input.t21.gz $submitdir/
# Investigate potentially other files to keep:
echo $(pwd)
echo $(ls -ltr)
# ALWAYS clean up after yourself. Please do uncomment the following line
#cd $submitdir
#rm $tempdir/*
#rmdir $tempdir
echo "Job finished at"
date
################### Job Ended ###################
exit 0
#!/bin/bash -l
# Write the account to be charged here
# (find your account number with `cost`)
#SBATCH --account=nnXXXXk
#SBATCH --job-name=daltonexample
# we ask for 20 cores
#SBATCH --ntasks=20
# run for five minutes
# d-hh:mm:ss
#SBATCH --time=0-00:05:00
# 500MB memory per core
# this is a hard limit
#SBATCH --mem-per-cpu=500MB
# Remove percentage signs to turn on all mail notification
#%%%SBATCH --mail-type=ALL
# You may not place bash commands before the last SBATCH directive!
# Load Dalton
# You can find all installed Dalton installations with:
# module avail dalton
module load Dalton/2013.2
# Define the input files
molecule_file=caffeine.mol
input_file=b3lyp_energy.dal
# Define and create a unique scratch directory
SCRATCH_DIRECTORY=/global/work/${USER}/daltonexample/${SLURM_JOBID}
mkdir -p ${SCRATCH_DIRECTORY}
cd ${SCRATCH_DIRECTORY}
# Define and create a temp directory
TEMP_DIR=${SCRATCH_DIRECTORY}/temp
mkdir -p ${TEMP_DIR}
# copy input files to scratch directory
cp ${SLURM_SUBMIT_DIR}/${input_file} ${SLURM_SUBMIT_DIR}/${molecule_file} ${SCRATCH_DIRECTORY}
# run the code
dalton -N ${SLURM_NTASKS} -t ${TEMP_DIR} ${input_file} ${molecule_file}
# copy output and tar file to submit dir
cp *.out *.tar.gz ${SLURM_SUBMIT_DIR}
# we step out of the scratch directory and remove it
cd ${SLURM_SUBMIT_DIR}
rm -rf ${SCRATCH_DIRECTORY}
# happy end
exit 0
#!/bin/bash -l
################### Gaussian Job Batch Script Example ###################
# Section for defining queue-system variables:
#-------------------------------------
# This script asks for a given set of nodes and cores. Stallo has got 16 or 20 cores/node,
# so you need to know what you want.
# Runtime for this job is 59 minutes; syntax is hh:mm:ss.
# Memory is set to the maximum advised for a full node, 1500MB/core - giving a total
# of 30000MB/node and leaving some for the system to use. Memory
# can be specified pr core, virtual or total pr job (be carefull).
#-------------------------------------
# SLURM-section
#SBATCH --job-name=g09_runex
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=20
#SBATCH --time=00:59:00
#SBATCH --mem-per-cpu=1500MB
#SBATCH --output=g09_runex.log
#SBATCH --mail-type=ALL
#SBATCH --exclusive
#SBATCH --partition=gaussian # This is a particular gasussian partition due to some technicalities with SLURM.
######################################
# Section for defining job variables and settings:
input=caffeine # Name of input without extention
ext=com # We use the same naming scheme as the software default extention
# We load all the default program system settings with module load:
module --quiet purge
module load Gaussian/09.d01
# Now we create working directory and temporary scratch for the job(s):
# Necessary variables are defined in the notur and the software modules.
export GAUSS_SCRDIR=/global/work/$USER/$SLURM_JOB_ID
mkdir -p $GAUSS_SCRDIR
# Preparing and moving inputfiles to tmp:
submitdir=$SLURM_SUBMIT_DIR
tempdir=$GAUSS_SCRDIR
cp $submitdir/${input}.${ext} $tempdir
cd $tempdir
# Preparation of inputfile is done by G09.prep in folder $g09tooldir
# If you want to inspect it, cd $g09tooldir after loading the gaussian module
Gaussian.prep $input
######################################
# Section for running the program and cleaning up:
# Running the program:
time g09 < ${input}.${ext} > gaussian_$input.out
# Cleaning up and moving files back to home/submitdir:
# Make sure to move all essential files specific for the given job/software.
cp gaussian_$input.out $submitdir
cp $input.chk $submitdir
# To zip some of the output might be a good idea!
#gzip $resultszip
#mv $resultzip.gz $SUBMITDIR/
# Investigate potentially other files to keep:
echo $(pwd)
echo $(ls -ltr)
# ALWAYS clean up after yourself. Please do uncomment the following line
#cd $submitdir
#rm $tempdir/*
#rmdir $tempdir
echo "Job finished at"
date
################### Job Ended ###################
exit 0
#!/bin/bash -l
################### VASP Job Batch Script Example ###################
# Section for defining queue-system variables:
#-------------------------------------
# This script asks for a given set of cores nodes and cores. Stallo has got 16 or 20 cores/node,
# asking for something that adds up to both is our general recommodation (80, 160 etc), you would
# then need to use --ntasks instead of --nodes and --ntasks-per-node. (replace both).
# Runtime for this job is 59 minutes; syntax is hh:mm:ss.
# Memory is set to the maximum advised for a full node, 1500MB/core - giving a total
# of 30000MB/node and leaving some for the system to use. Memory
# can be specified pr core, virtual or total pr job (be carefull).
#-------------------------------------
# SLURM-section
#SBATCH --job-name=molcas_runex
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=20
##SBATCH --ntasks=20
#SBATCH --time=00:59:00
#SBATCH --mem-per-cpu=1500MB
#SBATCH --output=molcas_runex.log
#SBATCH --mail-type=ALL
#SBATCH --exclusive
######################################
# Section for defining job variables and settings:
proj=ehtane # Name of project/folder
input=C2H6 # Name of job input
# We load all the default program system settings with module load:
module --quiet purge
module load Molcas/molcas82-intel-2015a
# You may check other available versions with "module avail Molcas"
# Now we create working directory and temporary scratch for the job(s):
# Necessary variables are defined in the notur and the software modules.
export MOLCAS_WORKDIR=/global/work/$USER/$SLURM_JOBID
mkdir -p $MOLCAS_WORKDIR
# Preparing and moving inputfiles to tmp:
submitdir=$SLURM_SUBMIT_DIR
tempdir=$MOLCAS_WORKDIR
cd $submitdir
cp ${input}.* $tempdir
cd $tempdir
######################################
# Section for running the program and cleaning up:
# Running the program:
time molcas Project=${proj} -f ${input}*.input
# Cleaning up and moving files back to home/submitdir:
# Make sure to move all essential files specific for the given job/software.
cp * $submitdir
# To zip some of the output might be a good idea!
#gzip $resultszip
#mv $resultzip.gz $SUBMITDIR/
# Investigate potentially other files to keep:
echo $(pwd)
echo $(ls -ltr)
# ALWAYS clean up after yourself. Please do uncomment the following line
#cd $submitdir
#rm $tempdir/*
#rmdir $tempdir
echo "Job finished at"
date
################### Job Ended ###################
exit 0
#!/bin/bash -l
#SBATCH --job-name=turbomole
#SBATCH --account=nnXXXXk
#SBATCH --time=1:00:00
#SBATCH --mem-per-cpu=1000MB
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=20
#SBATCH --mail-type=ALL
#SBATCH --partition=multinode
# For TURBOMOLE v7.2:
# --nodes : scale by the size of the calculation
# --ntasks-per-node : always a singel task per node
# --cpus-per-task : always full nodes
# load modules
module --quiet purge
module load TURBOMOLE/7.2
# TURBOMOLE needs this variable
export TURBOTMPDIR=/global/work/${USER}/${SLURM_JOBID}
workdir=${TURBOTMPDIR}
submitdir=${SLURM_SUBMIT_DIR}
# copy files to work
mkdir -p $workdir
cd $workdir
cp $submitdir/* $workdir
# set parallel environment
export PARNODES=${SLURM_NTASKS}
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}
# run jobex calculation
dscf > dscf.out
# copy files back
mv * $submitdir
# clean up
rm -r $workdir
exit 0
#!/bin/bash -l
################### VASP Job Batch Script Example ###################
# Section for defining queue-system variables:
#-------------------------------------
# This script asks for a given set of nodes and cores. cores. Stallo has got
# 16 or 20 cores/node, so you need to know what you want.
# Runtime for this job is 59 minutes; syntax is hh:mm:ss.
# Memory si set to the maximum amound adviced when asking for a full node, it
# adds up to 30 000MB, leaving a small part for the system to use. Memory
# can be specified pr core, virtual or total pr job (be carefull).
#-------------------------------------
# SLURM-section
#SBATCH --job-name=vasp_runex
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=20
#SBATCH --time=02:00:00
##SBATCH --mem-per-cpu=1500MB
#SBATCH --output=vasp_runex.log
#SBATCH --mail-type=ALL
#SBATCH --exclusive
#SBATCH --partition=multinode
######################################
# Section for defining job variables and settings:
# When you use this test script, make sure, your folder structure is as follows:
# ./job_vasp.sh
# ./CeO2job/INCAR
# ./CeO2job/KPOINTS
# ./CeO2job/POTCAR
# ./CeO2job/POSCAR
proj=CeO2job # Name of job folder
input=$(ls ${proj}/{INCAR,KPOINTS,POTCAR,POSCAR}) # Input files from job folder
# We load all the default program system settings with module load:
module --quiet purge
module load VASP/5.4.1.plain-intel-2016a
# You may check other available versions with "module avail VASP"
# Now we create working directory and temporary scratch for the job(s):
# Necessary variables are defined in the notur and the software modules.
export VASP_WORKDIR=/global/work/$USER/$SLURM_JOB_ID
mkdir -p $VASP_WORKDIR
# Preparing and moving inputfiles to tmp:
submitdir=$SLURM_SUBMIT_DIR
cp $input $VASP_WORKDIR
cd $VASP_WORKDIR
######################################
# Section for running the program and cleaning up:
# Running the program:
time mpirun vasp_std
# Cleaning up and moving files back to home/submitdir:
# Make sure to move all essential files specific for the given job/software.
cp OUTCAR $submitdir/${proj}.OUTCAR
# To zip some of the output might be a good idea!
#gzip results.gz OUTCAR
#mv $results.gz $submitdir/
# Investigate potentially other files to keep:
echo $(pwd)
echo $(ls -ltr)
# ALWAYS clean up after yourself. Please do uncomment the following line
# If we have to, we get really grumpy!
#cd $submitdir
#rm -r $VASP_WORKDIR/*
echo "Job finished at"
date
################### Job Ended ###################
exit 0
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment