[molpro-user] I/O error on large ccsd(t) job

Gert von Helden helden at fhi-berlin.mpg.de
Thu Dec 21 16:12:22 GMT 2006


Dear all,
I am trying to perform a large ccsd(t) calculation using molpro  
2006.1 on a opteron system. I compiled with pgf 6.1 (also tried  
ifort) and produced a serial and an mpi parallel version, linked with  
ga 4.0.1 and mpich. All program versions passed  the quicktests.
In all cases, the job fails with some I/O related error. Lack of disk  
space is very likely not the error (I tried also using a SAN with  
very large capacity as scratch).

I would appreciate any help!

...Gert


for a parallel job, running on 4 CPUs, I get (using 2 or only 1 CPUs  
gives variations of the same error):

>
>      108174.246 MB (compressed) written to integral file ( 43.0%)
>
>      Node minimum: 26165.903 MB, node maximum: 27669.299 MB
>
>
> NUMBER OF SORTED TWO-ELECTRON INTEGRALS: 7385262466.     BUFFER  
> LENGTH:  32768
> NUMBER OF SEGMENTS: 155  SEGMENT LENGTH:   47998888      RECORD  
> LENGTH: 262144
>
> Memory used in sort:      48.29 MW
>
> SORT1 READ31451116099. AND WROTE 5877056503. INTEGRALS IN33710  
> RECORDS. CPU TIME:  4100.09 SEC, REAL TIME:  5524.84 SEC
> 3:3:fehler on processor   3:: 4921228



The serial version gives:
(There were certainly a few hundred GB scratch still available)

>  Contracted 2-electron integrals neglected if value below      1.0D-13
> AO integral compression algorithm  1   Integral accuracy      1.0D-13
>
>      108173.984 MB (compressed) written to integral file ( 43.0%)
>
>
> NUMBER OF SORTED TWO-ELECTRON INTEGRALS:29541429126.     BUFFER  
> LENGTH:  32768
> NUMBER OF SEGMENTS: 185  SEGMENT LENGTH:  159999781      RECORD  
> LENGTH: 524288
>
> Memory used in sort:     160.56 MW
>
> SORT1 READ31451116099. AND WROTE23507657552. INTEGRALS IN67352  
> RECORDS. CPU TIME:  1633.81 SEC, REAL TIME:  6599.01 SEC
> Read error in iow_direct_read; fd=13, l=32768, p=30540206080; read  
> returns -1
>
> ERROR READING        32768 WORDS AT OFFSET30540206080. FROM FILE 4   
> IMPLEMENTATION=df    FILE HANDLE=    13  IERR=******
>
> Records on file 4
>
> IREC   NAME  TYPE        OFFSET    LENGTH   IMPLEMENTATION   EXT    
> PREV   PARENT  MPP_STATE
>    1    1350  BUCK         4096.**********         df           
> 0      0      0      0
>
> ? Error
> ? I/O error
> ? The problem occurs in readw
>
> ERROR EXIT
> CURRENT STACK:      MAIN
>
>
> ********************************************************************** 
> ************************************************************
> DATASETS  * FILE   NREC   LENGTH (MB)   RECORD NAMES
>               1      18   104522.63       500      610       
> 700      900      950      970     1000     1100     1400     1410
>                                           VAR    BASINP    GEOM     
> SYMINP    ZMAT    AOBASIS   BASIS      S        T        V
>                                          1200     1210     1080      
> 1600      129      960     1650     1300
>                                           H0       H01      
> AOSYM     SMH      P2S    ABASIS   MOLCAS    ERIS
>
>               2       4        1.33       500      610      700      
> 1000
>                                           VAR    BASINP    GEOM      
> BASIS




More information about the Molpro-user mailing list