2.2 Running MOLPRO on parallel computers

MOLPRO will run on distributed-memory multiprocessor systems, including workstation clusters, under the control of the Global Arrays parallel toolkit or the MPI-2 library. There are also some parts of the code that can take advantage of shared memory parallelism through the OpenMP protocol, although these are somewhat limited, and this facility is not at present recommended. It should be noted that there remain some parts of the code that are not, or only partly, parallelized, and therefore run with replicated work. Additionally, some of those parts which have been parallelized rely on fast inter-node communications, and can be very inefficient across ordinary networks. Therefore some caution and experimentation is needed to avoid waste of resources in a multiuser environment.

MOLPRO effects interprocess cooperation through the ppidd library, which, depending on how it was configured and built, draws on either the Global Arrays parallel toolkit or pure MPI. ppidd is described in Comp. Phys. Commun. 180, 2673-2679 (2009). Global Arrays handles distributed data objects using whatever one-sided remote memory access facilities are provided and supported. In the case of the MPI implementation, there is a choice of using either MPI-2 one-sided memory access, or devoting some of the processes to act as data `helpers'. It is generally found that performance is significantly better, and competitive with Global Arrays, if at least one dedicated helper is used, and in some cases it is advisable to specify more. The scalable limit is to devote one core on each node in a typical multi-core cluster machine, but in most cases it is possible to manage with fewer, thereby making more cores available for computation. This aspect of configuration can be tuned through the *-helper-server options described below.

Molpro can be compiled in three different ways:

  1. Serial execution only. In this case, no parallelism is possible at run time.
  2. `MPP': a number of copies of the program execute simultaneously a single task. For example, a single CCSD(T) calculation can run in parallel, with the work divided between the processors in order to achieve a reduced elapsed time.
  3. `MPPX': a number of copies of the program run in serial executing identical independent tasks. An example of this is the calculation of gradients and frequencies by finite difference: for the initial wavefunction calculation, the calculation is replicated on all processes, but thereafter each process works in serial on a different displaced geometry. At present, this is implemented only for numerical gradients and Hessians.
Which of these three modes is available is fixed at compilation time, and is reported in the job output. The options, described below, for selecting the number and location of processors are identical for MPP and MPPX.

molpro@molpro.net 2018-11-14