TU Berlin

EDP research areaUsage Information for the Cluster

Page Content

to Navigation

Usage of the Cluster

For computations which need large amounts of memory or computation time there are two clusters with currently up to 32 GB memory per node directly accessible from the desktop computers of the Department of Mathematics.
They are mostly software compatible with the desktop computers of the department and provide a simple way of access to computing resources bigger than what a desktop computer provides.
Even larger resources and newer hardware is available at the Compute servers, but you will need to apply for a separate account to use this.
This description only covers the first steps of using the part that is available without a separate account.
A more detailed description in german language is available at
 Dokumentation of the Compute Servers.

On this page you find detailed instructions for the use of the Cluster directly from a Computer in the Department of Mathematics.

Technical Data

Cluster 1

Attribute
Value
Number of nodes (directly accessible)
currently 13
Processors (per node)
2 x Opteron 252 (2.6 GHz)
Memory (per node)
4 GB
Network (intern)
Infiniband + Gigabit Ethernet
Bandwith
~800 MB/s (MPI)
Network (outwards)
Gigabit Ethernet

Cluster 2

Attribute
Value
Number of nodes
(directly accessible)
currently 17
Processors (per node)
2 x DualCore-Opteron 2218 (2.6 GHz)
Memory (per node)
16 GB (1 node with 32 GB)
Network (intern)
Infiniband + Gigabit Ethernet
Bandwith
~1500 MB/s (MPI)
Network (outwards)
Gigabit Ethernet

Transfer of programs

For the simplest usage of the Cluster, especially for interactive programs with user input, you can use the cluster command. Examples are:

% cluster matlab -nosplash -nodisplay < inp

or simply:

% cluster matlab

(In the examples above and in the following examples % is the command prompt of the shell.)

The cluster command starts programs in the same directory from which it is called, given that there are enough ressoures available at the cluster.

Otherwise cluster terminate with an error message. In that case you can submit your program the queue in batch mode. The program then stays in the queue and starts when the necessary ressources are available at the cluster. For interactive programs like in the second example this is doesn't make sense, but in the first example it does. How it is done is illustrated in the following examples:

1. With the option -now no:

% cluster -now no matlab -nosplash -nodisplay < inp

2. By writing a small job script and submitting it with qsub:

% cat > matlab.job <<EOF
#!/bin/tcsh
#$ -cwd
#$ -N matlab
#$ -o matlab.out
#$ -j y
matlab -nosplash -nodisplay < inp
EOF

% qsub matlab.job

Of course you can use your preferred text editor instead of 'cat' for writing job scripts.

The difference between these two methods is that in the second case the jobs runs entirely independent from your terminal session and you can for example log out of your computer and inspect the results at the next day. In the first case the terminal window must not be closed.

Format of a job script

Once again the example from above, this time with line numbers and a few more parameters:

1
#!/bin/tcsh
2
#$ -cwd
3
#$ -N matlab
4>
#$ -o matlab.out
5
#$ -j y
6
#$ -l h_rt=86400
7
#$ -l mem_free=2G
8
#$ -m be
9
#$ -M myself@math.tu-berlin.de
10
matlab -nosplash -nodisplay < inp


explanation:
2) Changes into the directory from where the job was submitted.
3) job name
4) output file
5) 'Join'=yes, i.e. write error messages and output into the output file
6) run time limit (in seconds).
7) memory requirements of the job
8) write a mail at beginning and end of the job
9) mail address (please provide this !!)

We strongly advise you to provide the a run time limit and the memory requirement of your job. The scheduler will schedule short jobs first and without a memory requirement by the user the job will automaticly be terminated after 12 hours.

All of the above parameters can also be given as an argument to qsub. In that case they would overwrite the requests from the job file:

% qsub -N test -l h_rt=80000 jobscript
% cluster -N test2 -pe mp 2 matlab -nosplash -nodisplay < inp

Jobs with large memory requirements, multi threaded programs

Programs that are going to use all of the memory (4GB, 16GB or 32GB) or that work with serveral threads or processes should reserve one cluster node with two or four processors exclusively. This can be achieved by adding the following line to the job script:

#$ -pe mp 2 -l cluster1

or

#$ -pe mp 4 -l cluster2

In the first case the job runs on a 4 GB node with 2 processors, in the second case with (min.) 16 GB and 4 processor cores.

Monitoring of the queue and of the cluster usage

The command qstat shows the currently running and waiting jobs:

% qstat
job-ID prior name user state submit/start at queue slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
4636 0.50500 matlab zeppo r 08/31/2006 22:19:48 all.q@node02 2
4531 0.50500 comsol pippo r 08/25/2006 18:05:38 all.q@node04 2
4632 0.50500 meinprog blump r 08/31/2006 10:12:18 all.q@node07 2
4621 0.55500 mpitest npasched qw 09/01/2006 08:44:36 10

Changing attributes and deletion of jobs

Some of the job parameters can be changed after submission of the job, partially even when the job is already running. This can be done with the command qalter. It accepts most of the parameters of qsub and sets these parameters for the given job ID.

Job that have a wrong setup or that should be deleted for some reason, can be deleted with the command qdel with their job ID as argument.

An example:
% qstat -u npasched
job-ID prior name user state submit/start at queue slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
4621 0.55500 mpitest npasched qw 09/01/2006 08:44:36 10
% qalter -m a 4621
modified mail options of job 4621
% qdel 4621
npasched has deleted job 4621


Array jobs

Sometimes a particular job has to be run with a big set of different data sets. The straightforward method to just write n slightly different job scripts ab submit them to batch system becomes surely annoying after n > 3.

A more elegant method are so called job arrays, where one job script is submitted with the instruction to run in n copies. A qsub command to achieve that looks like this:
% qsub -t 10-30:2 jobscript

The command above submits the script jobscript to the batch system and generates 11 copies that are each given a so called TASK_ID in the range [10..30] with distance 2, that is 10 12 14 16 ...
In the job script the task ID is available at two places:

1. In the script header:
Here you can for example add the task ID to the name of the output file, so that each copy writes into its own file:

#$ -o job.$TASK_ID.out

2. In the script itself:

Here the task ID is available trough the environment variable $SGE_TASK_ID. It can be used by the script itself or by processes started from the script,

#!/bin/tcsh
#$ -cwd
#$ -N matlab_run
#$ -o matlab_run.$TASK_ID.out
#$ -j y
#$ -m be
#$ -M myself@math.tu-berlin.de

matlab -nosplash -nodisplay < input.$SGE_TASK_ID.m

Here an example for a matlab input that reads the task ID directly:

task_id = str2num( getenv('SGE_TASK_ID') )

x = floor( task_id / 160 )
y = task_id - x * 160

.....

More environment variables are:

  • $SGE_TASK_LAST : the last task ID
  • $SGE_TASK_STEPSIZE : the step size




Parallel programs with MPI

For parallel programs with MPI one should use a job script similar to the following:

#!/bin/tcsh
#$ -cwd
#$ -pe ompi* 4
#$ -N mpitest
#$ -o mpitest.out
#$ -j y
#$ -m be
#$ -M myself@math.tu-berlin.de

module add ompi-1.2.2

mpirun -np $NSLOTS myprog

The red 4 stands for the number of processors (processor cores on multicore systems). It is also possible to request a range of processors:

#$ -pe ompi* 2-8

The request above starts the job with between 2 and 8 processors, depending on how many are available. The allocated number is available in the script via through the environment variable $NSLOTS.

The "Parallel Environments" ompi* requested with -pe in the example above are some kind of arrangement of groups for the queues.

They determine how the processes are distributed on the nodes when more then one queue slot is requested. There exist quite a few of these ompi* PEs. The '*' in the request above means: Take anyone that begins with 'ompi'.

The following list shows alle available PEs:


PE name
Cluster
Processes/nodes
mp
*
n
mpi1
*
1
mpi2
*
2
mpi4
*
4
mpi
*
fill
ompi1_1
1
1
ompi1_2
1
2
ompi1_n
1
fill
ompi2_1
2
1
ompi2_2
2
2
ompi2_4
2
4
ompi2_n
2
fill


Explanation:

n: like stated
fill: A node gets "filled up" with processes, then proceed to the next one.
*: anything

The -pe could also look like this:

#$ -pe mpi1 2-8

For programs compiled with OpenMPI you should only use the PEs whose name begins with ompi*.

The PEs with name 'mpi*' are for programs that use ethernet based MPI.

Development tools

There are some additional compilers installed for the cluster and the other 64bit computers, that often achieve better performance than the  normal gcc versions.

Here is a synopsis:

Manufactorer
Name of the compiler
Pprogramming
language
Installed version
modul
Gnu
gcc
C89, C991
4.1.2
g++
ISO C++ 89
4.1.2
g77
Fortran77
4.1.2
gfortran
Fortran77, Fortran90
4.1.2
Intel
ifort
Fortran77, Fortran901
9.0.25, 9.1.36, 10.1.018, 11.0.069
ifc*
icc
C89, C991
9.0.23, 9.1.42, 10.1.018, 11.0.069
icc*
icpc
ISO C++ 89
9.0.23, 9.1.42, 10.1.018, 11.0.069
icc*
PathScale2)
pathCC
ISO C++ 89
2.0, 2.1, 2.2.1, 2.3, 2.4, 2.5,3.0,3.1
pathscale-*
pathcc
ISO C89, C99
2.0, 2.1, 2.2.1, 2.3, 2.4, 2.5, 3.0, 3.1
pathscale-*
pathf77
Fortran77
2.0, 2.1, 2.2.1, 2.3, 2.4, 2.5, 3.0, 3.1
pathscale-*
pathf90
Fortran90
2.0, 2.1, 2.2.1, 2.3, 2.4, 2.5, 3.0, 3.1
pathscale-*
pathf95
Fortran95
2.0, 2.1, 2.2.1, 2.3, 2.4, 2.5, 3.0, 3.1
pathscale-*
Portland
pgcc
C89
6.0.5, 6.1, 6.2.6, 7.0-4, 7.1-1, 7.2-4
pgi-*
pgCC
ISO C++ 89
6.0.5, 6.1, 6.2.6, 7.0-4, 7.1-1, 7.2-4
pgi-*
pgf77
Fortran77
6.0.5, 6.1, 6.2.6, 7.0-4, 7.1-1, 7.2-4
pgi-*
pgf90
Fortran90
6.0.5, 6.1, 6.2.6, 7.0-4, 7.1-1, 7.2-4
pgi-*
pgf95
Fortran95
6.0.5, 6.1, 6.2.6, 7.0-4, 7.1-1, 7.2-4
pgi-*


1) partially
2) No licences available since Nov. 08


Not all compilers are available in the standard search path. If you need a special version, you can set the respective environment variables with the module command. An example for the intel compiler:

% module add icc11.0.069

You can see the name of the modul from the table above and complete it with the version number. With module avail you get a list of all modules available:

% module avail

 

 

------------------- /usr/share/modules/modulefiles -----------------------

dot ifc9.1.36 mvapich-pgi-0.9.8 use.own

g03 lahey8.0 null

gaussian03 module-cvs pathscale-2.0

gcc402 module-info pathscale-2.1

gcc411 modules pathscale-2.2.1

gridengine mpich-ch-p4 pathscale-2.3

icc9.0 mpich-ch-p4mpd pathscale-2.3.1

icc9.0.23 mvapich-gcc pathscale-2.4

icc9.1 mvapich-gcc-0.9.8 pathscale-2.5

icc9.1.42 mvapich-intel-0.9.8 pgi-6.0.5

ifc9.0 mvapich-pathscale pgi-6.1

ifc9.0.25 mvapich-pathscale-0.9.8 pgi-6.2.5

ifc9.1 mvapich-pgi starcd-3.24

 


Not all modules listed are about compilers. You can get informations about a particular modul with:

% module help icc11.0.069

 

----------- Module Specific Help for 'pgi-6.0.5' ------------------

Sets up the paths you need to use the Portland 6.0.5 compiler suite as the default

If you use a certain modul very often, you can add the respective module commands to your ~/.cshrc- or ~/.bashrc. For example:

% tail ~/.cshrc

if ( ${?MODULESHOME} ) then
module load pathscale-2.4 gcc411
endif

#end of .cshrc


Development tools for MPI programs

To compile MPI programs you should use the MPI compiler wrapper mpicc, mpif77. These wrappers call one of the compilers mentioned above and link the correct MPI libraries. The relevant modules are:

Modul name
Explanation
ompi-gcc-1.2.2
OpenMPI 1.2.2 with gcc
ompi-pathscale-1.2.2
"      with pathscale
ompi-pgi-1.2.2
"      with Portland

Relevant manual pages

cluster, qrsh, qsub, qstat, qalter.

Support/contact information

The clusters are supported by

  • Norbert Paschedag, MA 368, Tel. 314 29264
  • Kai Waßmuß, MA 368, Tel.: 314 29283


For any questions regarding the cluster or problems with the usage please send a (clust_staff)

Navigation

Quick Access

Schnellnavigation zur Seite über Nummerneingabe