[molpro-user] problems with global file system when running in parallel

Jörg Saßmannshausen j.sassmannshausen at ucl.ac.uk
Mon Feb 4 12:42:37 GMT 2013


Dear all,

I was wondering if somebody could shed some light on here.

When I am trying to do a DF-LCCSD(T) calculation, the first few steps are 
working ok but then the program crashes when it comes to here:

 MP2 energy of close pairs:            -0.09170948
 MP2 energy of weak pairs:             -0.06901764
 MP2 energy of distant pairs:          -0.00191297

 MP2 correlation energy:               -2.48344057
 MP2 total energy:                   -940.89652776

 LMP2 singlet pair energy              -1.53042229
 LMP2 triplet pair energy              -0.95301828

 SCS-LMP2 correlation energy:          -2.42949590   (PS=  1.200000  PT=  
0.333333)
 SCS-LMP2 total energy:              -940.84258309

 Minimum Memory for K-operators:     2.48 MW Maximum memory for K-operators    
28.97 MW  used:    28.97 MW
 Memory for amplitude vector:        0.52 MW

 Minimum memory for LCCSD:     8.15 MW, used:     65.01 MW, max:     64.48 MW

 ITER.      SQ.NORM     CORR.ENERGY   TOTAL ENERGY   ENERGY CHANGE        DEN1      
VAR(S)    VAR(P)  DIIS     TIME
   1      1.96000293    -2.52977250  -940.94285970    -0.04633193    
-2.42872569  0.35D-01  0.15D-01  1  1   348.20


Here are the error messages which I found:

5:Segmentation Violation error, status=: 11
(rank:5 hostname:node32 pid:5885):ARMCI DASSERT fail. 
src/common/signaltrap.c:SigSegvHandler():310 cond:0
  5: ARMCI aborting 11 (0xb).
tmp = /home/sassy/pdir//usr/local/molpro-2012.1/bin/molpro.exe.p
 Creating: host=node33, user=sassy,
[ ... ] 

and

Last System Error Message from Task 5:: Bad file descriptor
  5: ARMCI aborting 11 (0xb).
system error message: Invalid argument
 24: interrupt(1)
Last System Error Message from Task 2:: Bad file descriptor
Last System Error Message from Task 0:: Inappropriate ioctl for device
  2: ARMCI aborting 2 (0x2).
system error message: Invalid argument
Last System Error Message from Task 3:: Bad file descriptor
  3: ARMCI aborting 2 (0x2).
system error message: Invalid argument
WaitAll: Child (25216) finished, status=0x8200 (exited with code 130).
[ ... ]

I got the feeling there is a problem with reading/writing some files.
The global file system got around 158G of disc space free and as far as I could 
see it it was not full at the time of the run. 

Interestingly, the same input file but with the local scratch space was 
working. As the local scratch is rather small I would use the global, larger 
system. 

Are there any known problems with that approach or is there something I am 
doing wrong here?

All the best from a sunny London

Jörg

-- 
*************************************************************
Jörg Saßmannshausen
University College London
Department of Chemistry
Gordon Street
London
WC1H 0AJ 

email: j.sassmannshausen at ucl.ac.uk
web: http://sassy.formativ.net

Please avoid sending me Word or PowerPoint attachments.
See http://www.gnu.org/philosophy/no-word-attachments.html




More information about the Molpro-user mailing list