[molpro-user] running parallel GA Molpro from under Torque?

Wed Nov 27 20:56:40 GMT 2013

>>> Thanks for the advice! We have no MVAPICH2, but there is Intel MPI which
>>> is a derivative of it.
>>
>>MVAPICH and Intel MPI are derivatives of MPICH.  The IB code in Intel
>>MPI is different than in MVAPICH.  As far as I know, Intel uses DAPL,
>>which is ancient, whereas MVAPICH uses OFED (and PSM, if necessary).
>
> Intel has both. What it doesn't have is Torque CPUset support.

I recommend that you send a message to discuss at mpich.org reporting
this issue in hopes that it can be fixed in Hydra, which is the
process manager upstream.

>>Building MVAPICH from source is not complicated and does not require
>>admin rights.
>
> I  do have admin rights. Actually, as an admin, I am reluctant to
> proliferate kinds of MPI to support, and I think it would not be liked by
> our other staff members. OpenMPI is our standard, we know how to use it
> with our IB, and we rely on CPUsets, as we schedule by core, not by whole
> node.

Given how many codes are bandwidth-limited, you might want to ask
yourself whether it would make more sense to schedule according to the
most limited resource on the node, which is the memory bus.  If a user
is running STREAM on one core, the other cores will be borderline
useless for anything other than computing in L1+registers.  Sparse
matrix-vector multiplication and many other scientific computations
are good approximations to STREAM.

You might want to consider a policy of scheduling a few nodes by core
for the simpletons that don't run in parallel and allocate the rest of
the machine according to the properties of modern hardware, which
means giving users a dedicated memory controller, IO controller,
network endpoint, etc.

Global Arrays does nasty things to the system resources (e.g. Sys5
shared memory, IB-registered and locked pages, etc.) such that I would
consider it at best dangerous to co-schedule Molpro or NWChem with
anything else.  PNNL (home of NWChem) purges all the node resources
after each job because of issues like this.  And if an ARMCI
application crashes, I have no confidence that all of the system
resources are cleaned up by the signal handler.  You need not just a
"kill -9 $USER" but also an "ipcclean" to have confidence that you're
not giving the next user a bum node.

>>> The only way to use Intel MPI I know of is OSC mpiexec; this one does
>>>not
>>> allow oversubscription (i.e., no data-server processes) but would use
>>> Torque   API to launch processes. Is there a(n easy) way to use OSC
>>> mpiexec with the "molpro" script? Has anyone tried it so?
>>
>>Why do you want to oversubscribe?  I bet you that Molpro runs better
>>if you don't.  Have you done a detailed performance evaluation of this
>>already?
>
> That is a good question. I haven't tried it yet. Perhaps I misread the
> documentation: would  --multiple-helper-server oversubscribe, in terms of
> Torque, as it would create additional data-server processes like
> GAMESS-US, or would you just have some wasted cores with it? And it is
> said that performance would be bad without them (perhaps due to the MPI
> polling?). How bad is it?

I have only a limited knowledge of Molpro's PPIDD right now but the
way GAMESS does things is a legacy of the 20th century before RDMA
became available on clusters.  It's sad that they don't modernize DDI
to match modern hardware.

As for wasting cores, you have to ask yourself first if you are
strictly compute-bound and then second, if you can utilize cores
effectively with multithreaded MKL.  If Molpro is DGEMM-limited, then
threading in MKL should be fine with 1-2 MPI processes per socket.  If
it is AO-limited, it wouldn't surprise me if it is bandwidth- and/or
cache-limited.  Finally, if it is IO-limited, slamming on one IO hub
with more cores isn't really effective.

Again, I am not a Molpro expert, but when we did a detailed study of
NWChem [http://onlinelibrary.wiley.com/doi/10.1002/cpe.1881/abstract -
https://wiki.alcf.anl.gov/parts/images/2/2d/TAU-ARMCI.pdf], we found
that mild undersubscription was optimal.

Jeff

-- 
Jeff Hammond
jeff.science at gmail.com