[molpro-user] Parallel Molpro sometimes not working on Opteron

Reuti reuti at staff.uni-marburg.de
Sat Nov 13 14:25:30 GMT 2004


Hi,

my configuration on a Linux system on Opteron is:

Linux-2.4.21-15.ELsmp/icompile(x86_64) 64 bit version
Molpro 2002.6 patch level 74
Atlas 3.6.0
Lapack 3.0+
GlobalArrays 3.3.1
gcc 3.3.5
pgf90 5.1-6

First I compiled Atlas and LAPACK on my own, ran the test suites for these and 
built the serial Molpro). All is running fine when I now run "make test" for 
Molpro.

The problem starts when trying to run it in parallel (a complete new build of 
course for parallel). I compiled GlobalArrays according to the build 
instructions (with a small change in parallel.c to use "rsh" instead of 
"/usr/bin/rsh"). The test program of GlobalArrays is running fine, and I can 
compile the parallel Molpro. Starting the parallel testsuite of Molpro first 
seems good, up to:

make MOLPRO_OPTIONS=-n2 test
..
<snip>
..
cp -p /home/reuti/local/ga-3.3.1/bin/parallel bin/parallel
make[1]: Entering directory `/home/reuti/molpro2002.6/testjobs'
Running test job h2o_vdz.test
Running test job registry.test
Running test job h2o_select.test
Running test job h2o_explicit.test
Running test job n2_restrict.test
Running test job h2o_ano.test
Received signal 11 Segmentation violation
1:1:fehler:: 0
Last System Error Message from Task 1:: No such file or directory
  1: ARMCI aborting 0 (0).
system error message: No such file or directory
0:Child process terminated prematurely, status=: 256
Last System Error Message from Task 0:: No such file or directory
  0: ARMCI aborting 256 (0x100).
system error message: No such file or directory
  2: interrupt(1)
WaitAll: No children or error in wait?
**** PROBLEMS WITH TEST JOB h2o_ano.test
h2o_ano.test: ERRORS DETECTED: non-zero return code ... inspect output
**** For further information, look in the output file testjobs/h2o_ano.errout
**** in the directory 
make[1]: [h2o_ano.out] Error 1 (ignored)

And the output in the output file testjobs/h2o_ano.errout:

<snip>
..
 Contracted 2-electron integrals neglected if value below      1.0D-11
 AO integral compression algorithm  1   Integral accuracy      1.0D-11

     2.884 MB (compressed) written to integral file ( 62.2%)

     Node minimum: 1.311 MB, node maximum: 1.573 MB


 NUMBER OF SORTED TWO-ELECTRON INTEGRALS:     197996.     BUFFER LENGTH:  32768
 NUMBER OF SEGMENTS:   1  SEGMENT LENGTH:     197996      RECORD LENGTH: 524288

 Memory used in sort:       0.76 MW
1:1:fehler:: 0
  1: ARMCI aborting 0 (0).
0:Child process terminated prematurely, status=: 256
  0: ARMCI aborting 256 (0x100).
tmp = 
/home/reuti/pdir//home/reuti/molpro2002.6/bin/molprop_2002_6_i8_amd64_tcgmsg.ex
e.p
 Creating: host=icompile, user=reuti,
           
file=/home/reuti/molpro2002.6/bin/molprop_2002_6_i8_amd64_tcgmsg.exe, 
port=54238
h2o_ano.test: ERRORS DETECTED: non-zero return code ... inspect output


After looking in the source, the "pdir" seems not to be used by GlobalArrays, 
because Molpro creates a file in /tmp (unless $TMPDIR is set) and set it's name 
in a vaiable $PROCGRP which points to this file with the used machines, which 
you can prepare before in a $PBS_NODEFILE (unless you prepare a $PROCGRP before 
on your own). (BTW: is this anywhere documented?)

The question is: where is the german "1:1:fehler::" coming from, and how can I 
get it working?

Cheers - Reuti



More information about the Molpro-user mailing list