 
 
 
 
 
 
 
  
 Next: About this document ...
 Up: 3 Parallelism
 Previous: 3.3 Parallelization levels
     Contents 
Subsections
3.4 Tricks and problems
Many problems in parallel execution derive from the mixup of different
MPI libraries and runtime environments. There are two major MPI
implementations, OpenMPI and MPICH, coming in various versions,
not necessarily compatible; plus vendor-specific implementations
(e.g. Intel MPI). A parallel machine may have multiple parallel
compilers (typically, mpif90 scripts calling different
serial compilers), multiple MPI libraries, multiple launchers
for parallel codes (different versions of mpirun and/or
mpiexec). You have to figure out the proper combination
of all of the above, which may require using command module
or manually setting environment variables and execution paths.
What exactly has to be done depends upon the configuration of your
machine. You should inquire with your system administrator or user
support (if available; if not, YOU are the system administrator
and user support and YOU have to solve your problems).
Always verify if your executable is actually compiled for
parallel execution or not: it is declared in the first lines
of output. Running several instances of a serial code with
mpirun or mpiexec produces strange crashes.
Some implementations of the MPI library have problems with input
redirection in parallel. This typically shows up under the form of
mysterious errors when reading data. If this happens, use the option
-i (or -in, -inp, -input),
followed by the input file name.
Example:
   pw.x -i inputfile -nk 4 > outputfile
Of course the
input file must be accessible by the processor that must read it
(only one processor reads the input file and subsequently broadcasts
its contents to all other processors).
Apparently the LSF implementation of MPI libraries manages to ignore or to
confuse even the -i/in/inp/input mechanism that is present in all
QUANTUM ESPRESSO codes. In this case, use the -i option of mpirun.lsf
to provide an input file.
If you notice very bad parallel performances with MPI and MKL libraries,
it is very likely that the OpenMP parallelization performed by the latter
is colliding with MPI. Recent versions of MKL enable autoparallelization
by default on multicore machines.  You must set the environment variable
OMP_NUM_THREADS to 1 to disable it.
Note that if for some reason the correct setting  of variable
OMP_NUM_THREADS
does not propagate to all processors, you may equally run into trouble.
Lorenzo Paulatto (Nov. 2008) suggests to use the -x option to mpirun to
propagate OMP_NUM_THREADS to all processors.
Axel Kohlmeyer suggests the following (April 2008):
"(I've) found that Intel is now turning on multithreading without any
warning and that is for example why their FFT seems faster than
FFTW. For serial and OpenMP based runs this makes no difference (in
fact the multi-threaded FFT helps), but if you run MPI locally, you
actually lose performance. Also if you use the 'numactl' tool on linux
to bind a job to a specific cpu core, MKL will still try to use all
available cores (and slow down badly). The cleanest way of avoiding
this mess is to either link with
-lmkl_intel_lp64 -lmkl_sequential -lmkl_core (on 64-bit:
x86_64, ia64)
-lmkl_intel -lmkl_sequential -lmkl_core (on 32-bit, i.e. ia32 )
or edit the libmkl_'platform'.a file. I'm using now a file
libmkl10.a with:
  GROUP (libmkl_intel_lp64.a libmkl_sequential.a libmkl_core.a)
It works like a charm". UPDATE: Since v.4.2, configure links by
default MKL without multithreaded support.
Many users of QUANTUM ESPRESSO, in particular those working on PC clusters,
have to rely on themselves (or on less-than-adequate system managers) for
the correct configuration of software for parallel execution. Mysterious and
irreproducible crashes in parallel execution are sometimes due to bugs
in QUANTUM ESPRESSO, but more often than not are a consequence of buggy
compilers or of buggy or miscompiled MPI libraries.
 
 
 
 
 
 
 
  
 Next: About this document ...
 Up: 3 Parallelism
 Previous: 3.3 Parallelization levels
     Contents 
Filippo Spiga
2016-10-04