[molpro-user] Re: Clash between multiple instances of Molpro on one node

Thu Oct 26 07:27:11 BST 2006

Dear Bastiaan,
Of course I don't know what the reason for your crashes is, 
but the following things are important:

1.) Make sure the scratch file systems specified in bin/molpro.rc
    are correct and local on your system. It is of course deadly if
    a file system is accessed over NFS. This will be very slow and
    can lead to crashes if there is a problem with a switch or so.
    It helps a lot if you have two (or more) scratch file systems 
    on one machine (independent), since then the sort can read from
    one and write to the other. Your molpro.rc should then look like

    -d$TMPDIR:$TMPDIR4    # directory in which the program should be run
    -I$TMPDIR             # directory to store permanent copies of int files (1)
    -W$HOME/wfu           # directory to store permanent copies of wf files (2,3)

    where $TMPDIR points to the directory in which Molpro is running, and
    $TMPDIR4 is a secondary directory (different file system and disk)
    where the sorting file (4) is stored. If $TMPDIR and $TMPDIR4 are
    defined in your environment at configure/make time, this will be
    created automatically. It is important that the directories
    specified for -d and -I are the same; if -I would be different
    and you use a permanent file 1, the integral file would be
    copied in the end to -I.

    Should you indeed have 2 file systems, it may also help to let the
    first job run in $TMPDIR and the second in $TMPDIR4. You can achieve
    this by redefining $TMPDIR and $TMPDIR4 in a job script, like

    export $TMPDIR=/scr2
    export $TMPDIR4=/scr1
    molpro ...

2.) If the output prints

> SORT1 READ  104548810. AND WROTE   83425371. INTEGRALS IN  242
> RECORDS.  CPU TIME:     2.77 SEC, REAL TIME:     6.67 SEC
> SORT2 READ   83425371. AND WROTE   85040361. INTEGRALS IN 1655
> RECORDS.  CPU TIME:     3.15 SEC, REAL TIME:     5.47 SEC

   then the elapsed time for the sort was about 12 seconds, not 1000.
   The bottleneck is then probably something else.

3.) Are you restarting using some permanent file 2 or 3? 
    In this case each job copies in the beginning a previously existing
    file from the directory specified after -W (e.g. $HOME/wfu) to $TMPDIR.
    When the job has finished, the file is copied back for later restart.
    If $HOME/wfu is mounted over NFS and many jobs start at the same time,
    this might lead to a bottleneck.  Of course, each job which runs in 
    the same $TMPDIR must use a different file name, otherwise crashes are likely.
    If you don't need restart, do not specify any files in the input.

4.) A likely reason for exceedingly long elapsed times is paging. Are you sure
    your system has enough memory for supporting several jobs running in
    parallel? Specify as little memory as possible on the MEMORY directive!
    Some people tend to allocate enormous amounts of memory, like 
    memory,300,m or so. This would allocate 2.4 GB of dynamic memory plus some
    more for the program. If you start 2 jobs and your machine has only 4 GB, 
    you might be in trouble. The more Molpro allocates, the less is available
    for I/O buffering.

Best regards
Joachim

On Mi, 25 Okt 2006, Bastiaan J. Braams wrote:

>This is a follow-up to a question I posted here on October 14.  I've
>meanwhile learned a bit more about the problem.  The issue was:
>
>> I'm using the sequential version of Molpro, but running multiple
>> instances in parallel on a multiprocessor computer.  Sometimes this
>> results in unreproducible, seemingly random failures of the Molpro
>> runs, usually already at the Hartree-Fock level.  What might be the
>> cause and what might be a remedy?  Sorry not to be more specific, it
>> is happening to me on quite different computer systems and for
>> different Molpro calculations, and with 2002.6 and also 2006.2.
>> Should I look for a cause in memory contention, disk contention,
>> something in my directory structure, or maybe there is some subtlety
>> that needs to be taken into account when Molpro is installed?
>
>One system on which I experienced this problem was a new Opteron
>cluster, two cpu's per node, each dual core, with Molpro 2006.2
>installed.  I can run four small instances of Molpro in parallel on
>one node and get close to a factor of four improvement in turnaround.
>The problem occured with larger problem instances.  It turned out that
>I had mistakenly left $SCRATCH undefined, and as a result my
>nfs-mounted home directory was used for scratch space.  Maybe the
>calculations might have just slowed down, but instead they failed.  I
>don't want to try to trace the precise location where the calculations
>terminate, but in the punch file an RHF failure is reported.
>
>The initial solution was of course to let $SCRATCH point to a local
>disk on each node, as should have been the case all along.  However,
>now there was still a problem, although less transparent.  I start up
>my calculations, four per node, and the integral evaluations take a
>very long time.  The time might be less than 10 seconds CPU time and
>10-20 seconds wall clock time if a single calculation is running on
>the node, but with four calculations it would slow down to 1000
>seconds wall clock time or more, albeit without crashing.  This is the
>time reported in the *.out file in lines such as this:
>
> SORT1 READ  104548810. AND WROTE   83425371. INTEGRALS IN  242
> RECORDS.  CPU TIME:     2.77 SEC, REAL TIME:     6.67 SEC
> SORT2 READ   83425371. AND WROTE   85040361. INTEGRALS IN 1655
> RECORDS.  CPU TIME:     3.15 SEC, REAL TIME:     5.47 SEC
>
>(except that the real time would be 1000+ seconds.)  The situation is,
>I think, that the Molpro calculation starts with the integral
>evaluation, and when I submit four jobs they all start there at the
>same time.  Each of my processes is part of a "bag of tasks" parallel
>computation.  The process repetitively grabs a Molpro input file,
>performs the requested calculation, disposes of the output, and moves
>on to find the next unprocessed input file.  After a while the four
>processes on one node (that are calculating different geometries of
>the same molecule) are no longer synchronous and the times are,
>usually, tens or at most 100s of seconds for the integral evaluation.
>Nevertheless, since I don't trust the system to survive the very heavy
>initial disk load I now make sure also to stagger the starting times
>of the four processes that run on a single node.
>
>The other system on which I had the problems reported on October 14
>was one of the supercomputer centers.  There they don't use local
>scratch disk, instead they have a very large central disk system,
>using striping for fast and highly parallel access.  On this system I
>must use mpich to start up my jobs, I get, say 400 CPUs, and they all
>start their first Molpro calculation at the same time.  I think the
>disk system, fast as it is, is just overwhelmed anyway.  What seems to
>happen is that all jobs start at once, they overwhelm the disk, and
>they fail (I don't know why they fail and not just slow down, but that
>is what happens), and then my controlling process graps the next
>Molpro input and works on that.  But that next Molpro calculation
>again commences with integral evaluation, there is never any relief
>for the disk system, and my entire bag of tasks gets quickly depleted
>with all jobs failing in the earliest stage.
>
>In this case my workaround, which I'm almost inclined to call a
>solution, is again to stagger the starting times.  My job sits in the
>batch queue waiting for its assigned nodes, it then receives its 400
>CPUs, or whatever I ask for, and I let each task sleep for a random
>amoount of time before commencing its first Molpro calculation.  After
>that they all progress along their own course and I trust that it
>won't happen that too many of them try to use disk at the same time.
>
>Note that the integral evaluation part is only a small part of the
>complete Molpro calculation, maybe it takes one percent of the total
>CPU time.  A clash among many of my processes is not so likely,
>except, of course, at start-up time when they are all in synchrony.
>So I trust that by staggering the starting times of the processes and
>surviving the initial 20 minutes or so I'll get then my 12 hours of
>happy computing.  So far it seems to work.
>
>Bas
>--
>Bastiaan J. Braams
>braams at mathcs.emory.edu
>Emory University, Atlanta, GA

-- 
Prof. Hans-Joachim Werner
Institute for Theoretical Chemistry
University of Stuttgart
Pfaffenwaldring 55
D-70569 Stuttgart, Germany
Tel.: (0049) 711 / 685 64400
Fax.: (0049) 711 / 685 64442
e-mail: werner at theochem.uni-stuttgart.de