Page Content
to Navigation
Login hosts
The following hosts are available for login:
- cluster-a.math.tu-berlin.de
- cluster-i.math.tu-berlin.de
- cluster-g.math.tu-berlin.de
The address
- cluster.math.tu-berlin.de
points to one of the above login hosts (currently cluster-i).
cluster-a has a AMD processor, cluster-i has a processor from Intel. For a lot of tasks and for submitting of jobs this is not relevant, but compilers can be instructed to optimize the code for the current architecture.
Please don't compute on the login hosts. Their purpose is editing and submission of jobs, and there are lots of people logged in there simultaneously (see below).
cluster-g is a node with an intel processor, which additionally has 2 Tesla C1060 GPU cards. More about this in the paragraph about the GPU cluster.
You can access the /scratch partitions of the nodes via:
/net/nodename.scratch/ |
that is for example
% ls /net/node008.scratch/ |
Batch system
The clusters are operated excusively via the batch system, i.e. jobs are written as a job script and are then submitted with the command 'qsub'.
A small job script might be generated and submitted like this:
% cat > myjob.job <<EOF |
Instead of 'cat' you can of course use your favourite text editor to edit job scripts.
The jobs starts when the requested ressources are available. That can be immediately or for special requests after a few days. Die user can choose to rechieve an email to be informed when the job starts or ends.
Format of a job script
Once again the last example, this time with line numbers a some more parameters:
1 #!/bin/tcsh |
Explanation:
2) change into the directory from where the job has been submitted
3) job name
4) output file
5) 'Join'=yes, i.e. write both error messages and output into the output file
6) maximum run time for the job in seconds
7) write a mail at start and end of the job
8) mail address (please provide this !!)
We strongly advise you to provide the a run time limit and the memory requirement of your job. The scheduler will schedule short jobs first and without a memory requirement by the user the job will automaticly be terminated after 12 hours. The maximum run time limit is currently 220 hours (status of 11/2011). Longer run time limits are possible after application, but discouraged because they tend to complicate the maintainance of the cluster and job abortions due to power outage or other errors get more probable the longer the jobs run.
All job parameters can also be given as an argument to the qsub command and overwrite the parameters from the job file in case of conflicts:
% qsub -N test -l h_rt=80000 -l mem_free=4G jobscript |
You can display the available parameters by calling
% qconf -sc |
Jobs that use a node exclusively
Each node has a number of slots, which correspond to the number of processor cores on that node. Usually the batch system assigns one job to each of the slots.
If you want to use a node exclusively, you can add the line
#$ -l exclusive |
to your job script.
The job is then executed on a free cluster node (as soon as one is available).
Monitoring of the queue and of the cluster usage
The command qstat shows running and queued jobs:
% qstat |
Options like '-u username' or '-s r' restrict the list. See 'man qstat'.
Changing attributes and deletion of jobs
Some of the job parameters can be changed after submission of the job, partially even when the job is already running. This can be done with the command qalter. It accepts most of the parameters of qsub and sets these parameters for the given job ID.
Job that have a wrong setup or that should be deleted for some reason, can be deleted with the command qdel with their job ID as argument.
An example:
|
Array jobs
Sometimes a particular job has to be run with a big set of different data sets. The straightforward method to just write n slightly different job scripts ab submit them to batch system becomes surely annoying after n > 3.
A more elegant method are so called job arrays, where one job script is submitted with the instruction to run in n copies. A qsub command to achieve that looks like this:
% qsub -t 10-30:2 jobscript
The command above submits the script jobscript to the batch system and generates 11 copies that are each given a so called TASK_ID in the range [10..30] with distance 2, that is 10 12 14 16 ...
In the job script the task ID is available at two places:
1. In the script header:
Here you can for example add the task ID to the name of the output file, so that each copy writes into its own file:
#$ -o job.$TASK_ID.out
2. In the script itself:
Here the task ID is available trough the environment variable $SGE_TASK_ID. It can be used by the script itself or by processes started from the script,
#!/bin/tcsh
#$ -cwd
#$ -N matlab_run
#$ -o matlab_run.$TASK_ID.out
#$ -j y
#$ -m be
#$ -M myself@math.tu-berlin.de
matlab -nosplash -nodisplay < input.$SGE_TASK_ID.m
Here an example for a matlab input that reads the task ID directly:
task_id = str2num( getenv('SGE_TASK_ID') )
x = floor( task_id / 160 )
y = task_id - x * 160
.....
More environment variables are:
- $SGE_TASK_LAST : the last task ID
- $SGE_TASK_STEPSIZE : the step size
Disk space
Each job gets a temporary directory where files generated by the jobs can be written to. You can read the path of this directory from the environment variable $TMPDIR in your job script. It is located at the local hard disk on the node where the job is executed. At the end of the job this directory will be deleted, so you should copy data which you need later to another directory at the end of the job.
Smaller amounts of data can be written to the home directory. There is a backup job which runs on the home directory each night. The quota for the home directory is however somwhat restrictive. While you can request a bigger quota, it will not be possible to store Gigabyte sized files there.
For larger files there is a directory /work/$USER.
Note that the data from this directory are not included in any backup!
Furthermore there is the directory /lustre. It is similar in size to the /work directory but should be faster. When a job generates large amounts of data, it can we written to this directory without generating load on the fileserver that serve the /work directory.
Note however that you should not store data on /lustre permanently.
The system administration might delete older data from /lustre from time to time.
Parallel programs with MPI
For parallel programs with MPI one should use a job script similar to the following:
#!/bin/tcsh
#$ -cwd
#$ -pe ompi* 4
#$ -N mpitest
#$ -o mpitest.out
#$ -j y
#$ -m be
#$ -M myself@math.tu-berlin.de
module add ompi-1.2.2
mpirun -np $NSLOTS myprog
The red 4 stands for the number of processors (processor cores on multicore systems). It is also possible to request a range of processors:
#$ -pe ompi* 2-8
The request above starts the job with between 2 and 8 processors, depending on how many are available. The allocated number is available in the script via through the environment variable $NSLOTS.
The "Parallel Environments" ompi* requested with -pe in the example above are some kind of arrangement of groups for the queues.
They determine how the processes are distributed on the nodes when more then one queue slot is requested. There exist quite a few of these ompi* PEs. The '*' in the request above means: Take anyone that begins with 'ompi'.
The following list shows the pattern for the names of the PEs::
name of PE | cluster | processes/nodes |
---|---|---|
mp | * | n |
mpi1 | * | 1 |
mpi2 | * | 2 |
mpi4 | * | 4 |
mpi | * | fill |
ompi1_1 | 1 | 1 |
ompi1_2 | 1 | 2 |
ompi1_n | 1 | fill |
ompi2_1 | 2 | 1 |
ompi2_2 | 2 | 2 |
ompi2_4 | 2 | 4 |
ompi2_n | 2 | fill |
ompi3_1 | 3 | 1 |
ompi3_2 | 3 | 2 |
ompi3_4 | 3 | 4 |
ompi3_n | 3 | fill |
usw.... |
The following list shows the number of slots per node for each cluster:
cluster | slots per node |
---|---|
1 | 2 |
2 | 4 |
3 | 4 |
4 | 4 |
5 | 2 |
6 | 16 |
7 | 8 |
8 | 8 |
9 | 4 |
10 | 8 |
11 | 12 |
12 | 8 |
In the PE list above means:
n | as stated |
fill | A node gets processes until its slots are filled, then the next node is filled |
* | anythingl |
The -pe parameter might also look like this:
#$ -pe mpi1 2-8 |
For programs compiled with OpenMPI you should only use the PEs whose name begins with ompi*.
The PEs with name 'mpi*' are for programs that use ethernet based MPI.
A list of host names is available in the file
$PE_HOSTFILE |
Development tools
There are some additional compilers installed for the cluster and the other 64bit computers that often achieve better performance than the normal gcc versions.
Here is a synopsis:
manufacturer | name of the compiler | programming language | Installed versions | modul |
---|---|---|---|---|
GNU | gcc | C89, C991 | 4.3.3 | |
g++ | ISO C++ 89 | 4.3.3 | ||
g77 | Fortran77 | 4.3.3 | ||
gfortran | Fortran77, Fortran90 | 4.3.3 | ||
Intel | ifort | Fortran77, Fortran901 | 9.0.25, 9.1.36, 10.1.018, 11.0.069, 11.1.064 | ifc-* |
icc | C89, C901 | 9.0.23, 9.1.42, 10.1.018, 11.0.069, 11.1.064 | icc-* | |
icpc | ISO C++ 89 | 9.0.23, 9.1.42, 10.1.018, 11.0.069, 11.1.064 | icc-* | |
PathScale2) | pathCC | ISO C++ 89 | 2.0, 2.1, 2.2.1, 2.3, 2.3.1, 2.4, 2.5, 3.0, 3.1, 3.2 | pathscale-* |
pathcc | ISO C89, C99 | |||
pathf77 | Fortran77 | |||
pathf90 | Fortran90 | |||
path95 | Fortran95 | |||
Portland | pgcc | C89 | 8.0-2, 8.0-6, 9.0-1, 10.0 | pgi-* |
pgCC | ISO C++ 89 | |||
pgf77 | Fortran77 | |||
pgf90 | Fortran90 | |||
pgf95 | Fortran95 |
1partially
2There are no licences available for the pathscale compilers anymore, but the run time environment can still be used.
Not all compilers are available in the standard search path. If you need a special version, you can set the respective environment variables with the module command. An example for the intel compiler:
% module add icc11.0.069 |
You can see the name of the modul from the table above and complete it with the version number. With module avail you get a list of all modules available:
% module avail |
Not all modules listed are about compilers. You can get informations about a particular modul with:
% module help pgi-6.0.5 |
If you use a certain modul very often, you can add the respective module commands to your ~/.cshrc- or ~/.bashrc. For example:
% tail ~/.cshrc |
Development tool for MPI programs
To compile MPI programs you should use the MPI compiler wrapper mpicc, mpif77. These wrappers call one of the compilers mentioned above and link the correct MPI libraries. A default version of OpenMPI is already available in the search path, other versions are available via modules. The relevant modules are:
modul | content |
---|---|
ompi-gcc-1.2.2 | OpenMPI 1.2.2 für gcc |
ompi-pgi-1.2.2 | OpenMPI 1.2.2 für Portland |
ompi-gcc-1.2.4 | OpenMPI 1.2.4 für gcc |
ompi-pgi-1.2.4 | OpenMPI 1.2.4 für Portland |
ompi-gcc-1.3.2 | OpenMPI 1.3.2 für gcc |
ompi-pgi-1.3.2 | OpenMPI 1.3.2 für Portland |
The compilation of MPI program then works like this:
% mpicc -o myprog myprog.c |
% mpif90 -o myprog myprog.f |
Contact information
The clusters are supported by
- Norbert Paschedag, MA 368, Tel. 314 29264
- Kai Waßmuß, MA 368, Tel.: 314 29283
For any questions regarding the cluster or problems with the usage please send a mail to the support (clust_staff)