Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revisionBoth sides next revision
running_molpro_on_parallel_computers [2021/04/27 11:20] – [Disk option] wernerrunning_molpro_on_parallel_computers [2021/06/02 08:46] qianli
Line 7: Line 7:
 There are different GA implementation options (runtimes), and there are advantages and disadvantages for using one or the other implementation (see [[GA Installation]]). There are different GA implementation options (runtimes), and there are advantages and disadvantages for using one or the other implementation (see [[GA Installation]]).
  
-The Molpro binary is built with the ''--with-sockets'' GA runtimewhich requires pre-allocation of GA memory in some large calculations+**Since Molpro 2021.2 the [[#disk option]] is used by default** in single node calculation, in which case large data structures are simply kept in MPI files. 
-**Failing to pre-allocate sufficient amount of GAs may lead to crashes or incorrect results, unless the [[#disk option]] is used.** +The behavior of previous versions can be recovered by the ''--ga-impl ga'' command line option. 
-It is therefore very important to read and understand [[#memory specifications]] before trying to run large scale parallel calculations with Molpro.+However''--ga-impl ga'' requires pre-allocation of GA memory in many calculations if the ''socket'' GA runtime is used, and failing to preallocate sufficient amount of GA memory may lead to crashes or incorrect results. 
 +Preallocating GA is not required with the ''mpi-pr'' runtime of GA or with the disk option. 
 ===== GA Installation notes  ===== ===== GA Installation notes  =====
  
Line 20: Line 22:
   * **''-N'' $|$ ''%%--task-specification%%'' //user1:node1:tasks1,user2:node2:tasks2$\dots$//** //node1, node2// etc. specify the host names of the nodes on which to run. On most parallel systems, node1 defaults to the local host name, and there is no default for node2 and higher. On Cray T3E and IBM SP systems, and on systems running under the PBS batch system, if -N is not specified, nodes are obtained from the system in the standard way. //tasks1, tasks2// etc. may be used to control the number of tasks on each node as a more flexible alternative to ''-n'' / //tasks_per_node//. If omitted, they are each set equal to ''-n'' / //tasks_per_node//. //user1, user2// etc. give the username under which processes are to be created. Most of these parameters may be omitted in favour of the usually sensible default values.   * **''-N'' $|$ ''%%--task-specification%%'' //user1:node1:tasks1,user2:node2:tasks2$\dots$//** //node1, node2// etc. specify the host names of the nodes on which to run. On most parallel systems, node1 defaults to the local host name, and there is no default for node2 and higher. On Cray T3E and IBM SP systems, and on systems running under the PBS batch system, if -N is not specified, nodes are obtained from the system in the standard way. //tasks1, tasks2// etc. may be used to control the number of tasks on each node as a more flexible alternative to ''-n'' / //tasks_per_node//. If omitted, they are each set equal to ''-n'' / //tasks_per_node//. //user1, user2// etc. give the username under which processes are to be created. Most of these parameters may be omitted in favour of the usually sensible default values.
   * **''-t'' $|$ ''%%--omp-num-threads%%'' //n//** Specify the number of OpenMP threads, as if the environment variable ''OMP_NUM_THREADS'' were set to //n//.   * **''-t'' $|$ ''%%--omp-num-threads%%'' //n//** Specify the number of OpenMP threads, as if the environment variable ''OMP_NUM_THREADS'' were set to //n//.
-  * **''%%--ga-impl%%'' //method//** specifies the method by which large data structure are held in parallel. Available options are ''GA'' (GlobalArrays, default) for ''disk'' (MPI Files, see [[#disk option]]). This option is most relevant on the more recent programs such as Hartree-Fock, DFT, MCSCF/CASSCF, and the PNO programs. +  * **''%%--ga-impl%%'' //method//** specifies the method by which large data structure are held in parallel. Available options are ''GA'' (GlobalArrays, default) for ''disk'' (MPI Files, see [[#disk option]]). This option is most relevant on the more recent programs such as Hartree-Fock, DFT, MCSCF/CASSCF, and the PNO programs
 +  * **''-D'' $|$ ''%%--global-scratch%%'' //directory//** specifies the scratch directory for the program which is accessible by all processors in multi-node calculations. This only affects parallel calculations with the [[running Molpro on parallel computers#disk option|disk option]].  
 +  * **''%%--all-outputs%%''** produces an output file for each process when running in parallel.
  
 ===== Memory specifications  ===== ===== Memory specifications  =====
Line 43: Line 47:
  
   * **Density fitting and PNO calculations**: divide the memory into equal parts for GA and stack memory.   * **Density fitting and PNO calculations**: divide the memory into equal parts for GA and stack memory.
-  * **HF, DFT, and MCSCF calculations**: 80% for the stack and 20% for GA.+  * **HF, DFT, and MCSCF calculations**: 75% for the stack and 25% for GA.
   * **Canonical MRCI or CCSD(T) calculations** on one-node: no GA space is needed.   * **Canonical MRCI or CCSD(T) calculations** on one-node: no GA space is needed.
  
Line 56: Line 60:
  
   * **''-M''** As described above.   * **''-M''** As described above.
-  * **''-M'' and ''-m''** The specified amount $m$ is allocated for each core, the the remaining memory for GA.+  * **''-M'' and ''-m''** The specified amount $m$ is allocated for each core, the remaining memory for GA.
   * **''-M'' and ''-G''** The specified amount $G$ is allocated for GA, and the remaining amount is split equally for stack memory of each process.   * **''-M'' and ''-G''** The specified amount $G$ is allocated for GA, and the remaining amount is split equally for stack memory of each process.
   * **''-M'' and ''-G'' and ''-m''** The specified amounts of $m$ and $G$ are allocated, and the $M$ value is ignored.   * **''-M'' and ''-G'' and ''-m''** The specified amounts of $m$ and $G$ are allocated, and the $M$ value is ignored.
Line 70: Line 74:
  
 Since version 2021.1, Molpro can use MPI files instead of GlobalArrays to store large global data. This option can be enabled globally by setting the environment variable ''MOLPRO_GA_IMPL'' to ''DISK'', or by passing the ''%%--ga-impl disk%%'' command-line option. Since version 2021.1, Molpro can use MPI files instead of GlobalArrays to store large global data. This option can be enabled globally by setting the environment variable ''MOLPRO_GA_IMPL'' to ''DISK'', or by passing the ''%%--ga-impl disk%%'' command-line option.
 +Since version 2021.2 the disk option is made the default in single-node calculations.
 Some programs in Molpro including DF-HF, DF-KS, (DF-)MULTI, DF-TDDFT, and PNO-LCCSD also support an input option ''implementation=disk'' to enable the disk option for the particular job step. Some programs in Molpro including DF-HF, DF-KS, (DF-)MULTI, DF-TDDFT, and PNO-LCCSD also support an input option ''implementation=disk'' to enable the disk option for the particular job step.
 The file system for these MPI files must be accessible by all processors. The file system for these MPI files must be accessible by all processors.
Line 75: Line 80:
 The directory can be tmpfs (e.g., ''-D /dev/shm'') in single-node calculations, and in this case the GAs / MPI Files are kept in shared memory. The directory can be tmpfs (e.g., ''-D /dev/shm'') in single-node calculations, and in this case the GAs / MPI Files are kept in shared memory.
  
-With the disk option one can avoids the problems associated with GA pre-allocation. +With the disk option the problems associated with GA pre-allocation are avoidedIn this case 
-Please use only ''-m'' or the ''memory'' card to specify Molpro scratch memory for each processor, and **do not provide ''-M'' or ''-G''** to avoid GA preallocation+use only ''-m'' or the ''memory'' card to specify Molpro scratch memory for each processor. To avoid GA preallocation **do not provide ''-M'' or ''-G''**. 
-Please also make sure ''-M'' and ''-G'' are not present in ''.molprorc'', etc.+Please also make sure that ''-M'' and ''-G'' are not present in ''.molprorc'', etc.
  
 The performance of the disk option varies depending on the I/O capacity, available system memory, the MPI software, and the nature of the calculation. The performance of the disk option varies depending on the I/O capacity, available system memory, the MPI software, and the nature of the calculation.
 Usually, the best practice is to reserve some system memory for the system to buffer I/O operations (i.e., not to allocate all available memory to Molpro with ''-m'' or the ''memory'' input card). Usually, the best practice is to reserve some system memory for the system to buffer I/O operations (i.e., not to allocate all available memory to Molpro with ''-m'' or the ''memory'' input card).
-When this is done the performance of single-node disk-based calculations can be comparable to GA-based ones in many cases.+When this is done the performance of single-node disk-based calculations can be comparable to GA-based ones in many cases, in particular with SSDs.
  
 ===== Embarrassing parallel computation of gradients or Hessians (mppx mode)  ===== ===== Embarrassing parallel computation of gradients or Hessians (mppx mode)  =====
Line 87: Line 92:
 The numerical computation of gradients or Hessians, or the automatic generation of potential energy surfaces, requires many similar calculations at different (displaced) geometries. An automatic parallel computation of the energy and/or gradients at different geometries is implemented for the gradient, hessian, and surf programs. In this so-called mppx-mode, each processing core runs an independent calculation in serial mode. This happens automatically using the ''-n'' available cores. The automatic mppx processing can be switched off by setting option ''mppx=0'' on the ''OPTG'', ''FREQ'', or ''HESSIAN'' command lines. In this case, the program will process each displacement in the standard parallel mode.  The numerical computation of gradients or Hessians, or the automatic generation of potential energy surfaces, requires many similar calculations at different (displaced) geometries. An automatic parallel computation of the energy and/or gradients at different geometries is implemented for the gradient, hessian, and surf programs. In this so-called mppx-mode, each processing core runs an independent calculation in serial mode. This happens automatically using the ''-n'' available cores. The automatic mppx processing can be switched off by setting option ''mppx=0'' on the ''OPTG'', ''FREQ'', or ''HESSIAN'' command lines. In this case, the program will process each displacement in the standard parallel mode. 
  
 +===== Options for developers =====
 +
 +==== Debugging options ====
 +
 +  * **''%%--ga-debug%%''** activates GA debugging statements.
 +  * **''%%--check-collective%%''** check collective operations when debugging.
  
-===== Options for pure MPI-based PPIDD build =====+==== Options for pure MPI-based PPIDD build ====
  
 This section is **not** applicable if the Molpro binary release is used, or when Molpro is built using the GlobalArrays toolkit (which we recommend). This section is **not** applicable if the Molpro binary release is used, or when Molpro is built using the GlobalArrays toolkit (which we recommend).