File header error running Molpro on multiple cluster nodes

Karen Haskell khaskell at atcc.necsys.com
Mon Aug 25 22:24:02 BST 2003


I've built Molpro2002.6 on our PC cluster:
     8 nodes, each is 2-CPU Pentium
     RedHat 9

I'm using GA3.2.6 (built with ARMCI_NETWORK=SOCKETS and tested OK with all 
processors),  Intel ifc7.1, and  mpich-1.2.5.

It runs fine with -n1, and also with -n2 as long as both processes are on one 
node.

When I try to run on multiple nodes, e.g. -n4, the processes start okay, I can 
see 2 processes on each of 2 nodes.  It does some output, then gets a file 
header error.  The processes remain and must be killed. 

Here is the start and end of the output file (h2o_vdz.out):
----------------------------------------------------------------------------------

     1 ARMCI configured for 2 cluster nodes
      2
      3  MPP nodes  nproc
      4  r2d2         2
      5  obiwan       2
      6  ga_uses_ma=false, calling ma_init with nominal heap. Any -G option 
will be ignored.
      7
      8  Primary working directories:    /tmp/molpro
      9  Secondary working directories:  /tmp/molpro
       ...
       etc.
       ...
    168  Variable memory set to    1000000 words,  buffer space   230000 words
    169
    170
    171
    172  Using spherical harmonics
    173
    174 Bad seek in iow_direct_write; fd=-1, p=4096
    175 Bad seek in iow_direct_write; fd=-1, p=4096
    176 -10000(s):armci_rcv_req: failed to receive header : 2
    177 0:Child process terminated prematurely, status=: 256
    178 Bad seek in iow_direct_write; fd=-1, p=4135
    179 -10002(s):armci_rcv_req: failed to receive header : 2
    180 Bad seek in iow_direct_write; fd=-1, p=4135
----------------------------------------------------------------------------------

Is this a problem with how the cluster is configured, how mpich 
is configured, or how Molpro is configured?   Or something else?

Any help would be appreciated.

Karen Haskell
khaskell at atcc.necsys.com
     




More information about the Molpro-user mailing list