[molpro-user] Re: Clash between multiple instances of Molpro on one node

Wed Oct 25 18:06:00 BST 2006

This is a follow-up to a question I posted here on October 14.  I've
meanwhile learned a bit more about the problem.  The issue was:

> I'm using the sequential version of Molpro, but running multiple
> instances in parallel on a multiprocessor computer.  Sometimes this
> results in unreproducible, seemingly random failures of the Molpro
> runs, usually already at the Hartree-Fock level.  What might be the
> cause and what might be a remedy?  Sorry not to be more specific, it
> is happening to me on quite different computer systems and for
> different Molpro calculations, and with 2002.6 and also 2006.2.
> Should I look for a cause in memory contention, disk contention,
> something in my directory structure, or maybe there is some subtlety
> that needs to be taken into account when Molpro is installed?

One system on which I experienced this problem was a new Opteron
cluster, two cpu's per node, each dual core, with Molpro 2006.2
installed.  I can run four small instances of Molpro in parallel on
one node and get close to a factor of four improvement in turnaround.
The problem occured with larger problem instances.  It turned out that
I had mistakenly left $SCRATCH undefined, and as a result my
nfs-mounted home directory was used for scratch space.  Maybe the
calculations might have just slowed down, but instead they failed.  I
don't want to try to trace the precise location where the calculations
terminate, but in the punch file an RHF failure is reported.

The initial solution was of course to let $SCRATCH point to a local
disk on each node, as should have been the case all along.  However,
now there was still a problem, although less transparent.  I start up
my calculations, four per node, and the integral evaluations take a
very long time.  The time might be less than 10 seconds CPU time and
10-20 seconds wall clock time if a single calculation is running on
the node, but with four calculations it would slow down to 1000
seconds wall clock time or more, albeit without crashing.  This is the
time reported in the *.out file in lines such as this:

 SORT1 READ  104548810. AND WROTE   83425371. INTEGRALS IN  242
 RECORDS.  CPU TIME:     2.77 SEC, REAL TIME:     6.67 SEC
 SORT2 READ   83425371. AND WROTE   85040361. INTEGRALS IN 1655
 RECORDS.  CPU TIME:     3.15 SEC, REAL TIME:     5.47 SEC

(except that the real time would be 1000+ seconds.)  The situation is,
I think, that the Molpro calculation starts with the integral
evaluation, and when I submit four jobs they all start there at the
same time.  Each of my processes is part of a "bag of tasks" parallel
computation.  The process repetitively grabs a Molpro input file,
performs the requested calculation, disposes of the output, and moves
on to find the next unprocessed input file.  After a while the four
processes on one node (that are calculating different geometries of
the same molecule) are no longer synchronous and the times are,
usually, tens or at most 100s of seconds for the integral evaluation.
Nevertheless, since I don't trust the system to survive the very heavy
initial disk load I now make sure also to stagger the starting times
of the four processes that run on a single node.

The other system on which I had the problems reported on October 14
was one of the supercomputer centers.  There they don't use local
scratch disk, instead they have a very large central disk system,
using striping for fast and highly parallel access.  On this system I
must use mpich to start up my jobs, I get, say 400 CPUs, and they all
start their first Molpro calculation at the same time.  I think the
disk system, fast as it is, is just overwhelmed anyway.  What seems to
happen is that all jobs start at once, they overwhelm the disk, and
they fail (I don't know why they fail and not just slow down, but that
is what happens), and then my controlling process graps the next
Molpro input and works on that.  But that next Molpro calculation
again commences with integral evaluation, there is never any relief
for the disk system, and my entire bag of tasks gets quickly depleted
with all jobs failing in the earliest stage.

In this case my workaround, which I'm almost inclined to call a
solution, is again to stagger the starting times.  My job sits in the
batch queue waiting for its assigned nodes, it then receives its 400
CPUs, or whatever I ask for, and I let each task sleep for a random
amoount of time before commencing its first Molpro calculation.  After
that they all progress along their own course and I trust that it
won't happen that too many of them try to use disk at the same time.

Note that the integral evaluation part is only a small part of the
complete Molpro calculation, maybe it takes one percent of the total
CPU time.  A clash among many of my processes is not so likely,
except, of course, at start-up time when they are all in synchrony.
So I trust that by staggering the starting times of the processes and
surviving the initial 20 minutes or so I'll get then my 12 hours of
happy computing.  So far it seems to work.

Bas
--
Bastiaan J. Braams
braams at mathcs.emory.edu
Emory University, Atlanta, GA