Both sides previous revision Previous revision Next revision | Previous revision Next revisionBoth sides next revision |
ga_installation [2020/12/16 14:16] – style changes qianli | ga_installation [2021/06/02 08:28] – qianli |
---|
| |
There are different ways (GA "runtimes") to configure the GA library which significantly affect the excution and options, and they are therefore important from a users point of view. | There are different ways (GA "runtimes") to configure the GA library which significantly affect the excution and options, and they are therefore important from a users point of view. |
The current default of the GA installation is to use ''%%-–with-mpi-ts%%'', which is **very slow** and should be avoided. | The current default of the GA installation is to use ''%%--with-mpi-ts%%'', which is **very slow** and should be avoided. |
Our current recommendations are: | Our current recommendations are: |
| |
* Use ''%%-–with-sockets%%'' for single workstations and servers connected by ethernet. | * Use ''%%--with-sockets%%'' for single workstations and servers connected by ethernet. |
* Use ''%%--with-openib%%'' for nodes connected by InfiniBand. On slurm-enabled systems, accoring to our experience this runtime is most stable with the rather old [[https://www.open-mpi.org/|openmpi]] 2.0.2 version. | * Use ''%%--with-openib%%'' for nodes connected by InfiniBand, preferably on top of the latest version of [[https://github.com/openucx/ucx/wiki/OpenMPI-and-OpenSHMEM-installation-with-UCX|openmpi with UCX]]. |
| * When a helper process on each node can be tolerated, use ''%%--with-mpi-pr%%'', also preferably using the latest version of openmpi with UCX. |
| |
These two GA runtimes do not need any helper processeses, but has a serious disvantage that with these runtimes the GA software frequently crashes if several GAs are allocated whose total size exceeds 16 GB (2 MW). This can be avoided by allocating in the beginning of the job a very large GA with at least the size of the total GA space needed in the calculation (which is not known in advance). This can be done with the ''-G'' or ''-M'' molpro options, see [[running_molpro_on_parallel_computers#memory_specifications|memory specifications]] for details. In addition, it may be necessary to set the environment variable ''ARMCI_DEFAULT_SHMMAX'' to a large value. | The first two GA runtimes do not need any helper processes, but has a serious disadvantage that with ''--ga-impl ga'' (the default for multi-node calculations since Molpro 2020.2 and for all calculations in earlier versions), the GA software frequently crashes or gives wrong numbers if several GAs are allocated whose total size exceeds 16 GB (2 MW). This can be avoided by using ''--ga-impl disk'' (default for single-node calculations since Molpro 2020.2), or allocating in the beginning of the job a very large GA with at least the size of the total GA space needed in the calculation (which is not known in advance). The latter can be done with the ''-G'' or ''-M'' molpro options. |
| See [[Running Molpro on parallel computers]] for more details. |
Alternatively: | |
| |
* When a helper process on each node can be tolerated, use ''%%--with-mpi-pr%%'', preferably on top of the latest version of [[https://github.com/openucx/ucx/wiki/OpenMPI-and-OpenSHMEM-installation-with-UCX|openmpi with UCX]]. | |
| |
The ''mpi-pr'' runtime requires one helper process from each node. Thus, if for example ''-n40'' is specified as molpro options in a 2-node calculation, only 38 MPI processes run molpro, and 2 additional helper processes are started. The performance of ''mpi-pr'' runtime is usually comparable to the ''socket'' or ''openib'' runtims. GA preallocation is not needed and automatically disabled by Molpro with the ''mpi-pr'' runtime. | The ''mpi-pr'' runtime requires one helper process from each node. Thus, if for example ''-n40'' is specified as molpro options in a 2-node calculation, only 38 MPI processes run molpro, and 2 additional helper processes are started. The performance of ''mpi-pr'' runtime is usually comparable to the ''socket'' or ''openib'' runtims. GA preallocation is not needed and automatically disabled by Molpro with the ''mpi-pr'' runtime. |
| |
The ''socket'' and ''openib'' GA runtimes uses System V shared memory while the ''mpi-pr'' runtime uses POSIX shared memory. In the latter case you may need to raise the size-limit of ''/dev/shm''. Also note that even in successful calculations shared memory segments are sometimes left in ''/dev/shm''. These can accumulate to make a machine unusable, and should therefore be deleted from time to time. | The ''socket'' and ''openib'' GA runtimes uses System V shared memory while the ''mpi-pr'' runtime uses POSIX shared memory. In the former case you may need to raise the value of ''SHMMAX'' and ''SHMALL'' kernel parameters, while in the latter case to raise the size-limit of ''/dev/shm''. Also note that even in successful calculations shared memory segments are sometimes left in ''/dev/shm''. These can accumulate to make a machine unusable, and should therefore be deleted from time to time. |
| |
Queries regarding Global Arrays installations should be sent directly to the Global Arrays team, any Molpro related queries will assume a fully functional Global Arrays suite with all internal tests run successfully. | Queries regarding Global Arrays installations should be sent directly to the Global Arrays team, any Molpro related queries will assume a fully functional Global Arrays suite with all internal tests run successfully. |
| |