2.2.1 Specifying parallel execution

The following additional options for the molpro command may be used to specify and control parallel execution.
-n $\vert$ --tasks tasks/tasks_per_node:smp_threads
tasks specifies the number of parallel processes to be set up, and defaults to 1. tasks_per_node sets the number of GA(or MPI-2) processes to run on each node, where appropriate. The default is installation dependent. In some environments (e.g., IBM running under Loadleveler; PBS batch job), the value given by -n is capped to the maximum allowed by the environment; in such circumstances it can be useful to give a very large number as the value for -n so that the control of the number of processes is by the batch job specification. smp_threads relates to the use of OpenMP shared-memory parallelism, and specifies the maximum number of OpenMP threads that will be opened, and defaults to 1. Any of these three components may be omitted, and appropriate combinations will allow GA(or MPI-2)-only, OpenMP-only, or mixed parallelism.
-N $\vert$ --task-specification user1:node1:tasks1,user2:node2:tasks2$\dots$
node1, node2 etc. specify the host names of the nodes on which to run. On most parallel systems, node1 defaults to the local host name, and there is no default for node2 and higher. On Cray T3E and IBM SP systems, and on systems running under the PBS batch system, if -N is not specified, nodes are obtained from the system in the standard way. tasks1, tasks2 etc. may be used to control the number of tasks on each node as a more flexible alternative to -n / tasks_per_node. If omitted, they are each set equal to -n / tasks_per_node. user1, user2 etc. give the username under which processes are to be created. Most of these parameters may be omitted in favour of the usually sensible default values.
-S $\vert$ --shared-file-implementation method
specifies the method by which the shared data are held in parallel. method can be sf or ga, and it is set automatically according to the properties of scratch directories by default. If method is manually set to sf, please ensure all the scratch directories are shared by all processes. Note that for GA version of MOLPRO, if method is set to sf manually or by default, the scratch directories can't be located in NFS when running molpro job on multiple nodes. The reason is that the SF facility in Global Arrays doesn't work well on multiple nodes with NFS. There is no such restriction for MPI-2 version of MOLPRO.
--multiple-helper-server nprocs_per_server
enables the multiple helper servers, and nprocs_per_server sets how many processes own one helper server. For example, when total number of processes is specified as $32$ and $nprocs\_per\_server=8$, then every $8$ processes(including helper server) will own one helper server, and there are $4$ helper servers in total. For any unreasonable value of $nprocs\_per\_server$ (i.e., any integer less than 2), it will be reset to a very large number automatically, and this will be equivalent to option --single-helper-server.
specifies one helper server on every node if all the nodes are symmetric and have reasonable processes (i.e., every node has the same number of processes, and the number should be greater than 1), and this is the default behaviour. Otherwise, only one single helper server for all processes/nodes will be used, and this will be equivalent to option --single-helper-server
specifies only one single helper server for all processes.
disables the helper server.
-t $\vert$ --omp-num-threads n
Specify the number of OpenMP threads, as if the environment variable OMP_NUM_THREADS were set to n.

Note that options --multiple-helper-server, --node-helper-server,
--single-helper-server, and --no-helper-server are only effective for MOLPRO built with MPI-2 library. In the cases of one or more helper servers enabled, one or more processes act as data helper servers, and the rest processes are used for computation. Even so, it is quite competitive in performance when it is run with a large number of processes. In the case of helper server disabled, all processes are used for computation; however, the performance may not be good because of the poor performance of some existing implementations of the MPI-2 standard for one-sided operations.

In addition, for MOLPRO built with GA library (MPI over InfiniBand), GA data structures can't be too large (e.g., 2GB per node) when running molpro job on multiple nodes. In this case, setting environment variable ARMCI_DEFAULT_SHMMAX might be helpful. The number should be less than 2GB (e.g., to set 1600MB for ARMCI_DEFAULT_SHMMAX in bash: export ARMCI_DEFAULT_SHMMAX=1600). One can also use more computer nodes to run such jobs, thus allocated memory for GA data structures on each node becomes smaller. There is no such restriction for MPI-2 version of MOLPRO.

molpro@molpro.net 2018-12-15