run on distributed-memory multiprocessor systems,
including workstation clusters, under
the control of the
parallel toolkit or the MPI-2 library. There are also some parts of the code that can take
advantage of shared memory parallelism through the
protocol, although these are somewhat limited, and this facility is
not at present recommended.
It should be noted that there remain
some parts of the code that are not, or only partly,
parallelized, and therefore run with replicated
work. Additionally, some of those parts which have been parallelized
rely on fast inter-node communications, and can be very inefficient
across ordinary networks. Therefore some caution and experimentation is
needed to avoid waste of resources in a multiuser environment.
2.2 Running MOLPRO on parallel computers
interprocess cooperation through
the ppidd library, which,
depending on how it was configured and built,
parallel toolkit or pure MPI.
ppidd is described in
Comp. Phys. Commun. 180, 2673-2679 (2009).
Global Arrays handles distributed data objects using whatever
one-sided remote memory access facilities are provided and supported.
In the case of the MPI implementation, there is a choice of using
either MPI-2 one-sided memory access, or devoting some of the
processes to act as data `helpers'. It is generally found that
performance is significantly better, and competitive with Global
Arrays, if at least one dedicated helper is used, and in some cases it
is advisable to specify more. The scalable limit is to devote one core
on each node in a typical multi-core cluster machine, but in most
cases it is possible to manage with fewer, thereby making more cores
available for computation. This aspect of configuration can be tuned
through the *-helper-server options described below.
Molpro can be compiled in three different ways:
Which of these three modes is available is fixed at compilation time,
and is reported in the job output. The options, described below, for
selecting the number and location of processors are identical for MPP
- Serial execution only. In this case, no parallelism is possible
at run time.
- `MPP': a number of copies of the program execute simultaneously
a single task. For example, a single CCSD(T) calculation can run in
parallel, with the work divided between the processors in order to
achieve a reduced elapsed time.
`MPPX': a number of copies of the program run in serial
executing identical independent tasks. An example of this is the
calculation of gradients and frequencies by finite difference: for
the initial wavefunction calculation, the calculation is replicated
on all processes, but thereafter each process works in serial on a
different displaced geometry.
At present, this is implemented only for numerical gradients and