[molpro-user] {SPAM}? Writew error

Shenggang Li shenggangli at gmail.com
Wed Sep 2 18:22:45 BST 2009


Andy,

Thanks a lot for the reply.  I was about to update everyone on this.
Working with the people who built the cluster, we finally found out it was
due to the way the scratch filesystem was built on the cluster.  For some
reason, it used 1KB for the block size to build the ext3 filesystem, ended
up with a maximum of 16GB per file, regardless of how much total storage we
had.  Please read this wiki page for more information:

http://en.wikipedia.org/wiki/Ext3

Anyway, we rebuilt the filesystem with a block size of 4096 with -b option,
and all the jobs are running properly now.  Based on the above wiki page,
the maximum block size for the x86_64 platform is 4KB, so a single file is
limited to about 2TB, but that's sufficent for us.

Best regards,

Shenggang.

On Wed, Sep 2, 2009 at 4:19 AM, Andy May <MayAJ1 at cardiff.ac.uk> wrote:

> Shenggang.
>
> The only time I have seen such errors have been in relation to disk
> space. Just to check, does Molpro actually use the file system which is
> 2TB in size (you can see the location of the scratch directories at the
> top of the output). Sometimes, especially on lustre file systems, PBS is
> configured to set TMPDIR to a local, but small, tmp directory and one
> must manually specify the lustre if desired.
>
> Please could you post the input file and I will try to run locally.
>
> Best wishes,
>
> Andy
>
> Shenggang Li wrote:
> > Dear Jorg,
> >
> > Thanks for the reply.  Like I said we have large local storage (2TB per
> > node), and there is only one job running on a node at a time.  The queue
> > also takes care of creating and deleting the scratch directory, so it
> > shouldn't have anything left over from previous jobs.  Scratch disk
> > usage is fairly low, less than 100GB for these failing jobs.
> >
> > Shenggang.
> >
> > On Wed, Aug 26, 2009 at 12:16 PM, Jörg Saßmannshausen
> > <jorg.sassmannshausen at strath.ac.uk
> > <mailto:jorg.sassmannshausen at strath.ac.uk>> wrote:
> >
> >     Dear  Shenggang Li,
> >
> >     could it be that your disc is simply full? At least I get this error
> >     when my
> >     scratch is up to the brim (i.e. full).
> >
> >     I could be wrong, of course, as I am new to MolPro.
> >
> >     All the best
> >
> >     Jörg
> >
> >     On Dienstag 25 August 2009 Shenggang Li wrote:
> >     > Dear molpro users,
> >     >
> >     > Recently we have been having problems running large Molpro jobs on
> our
> >     > Opteron clusters.  We tried both 2008.1 (patch level 46) and
> >     2009.1 (patch
> >     > level 1) built with different compilers and libraries (Intel
> >     version 9.1 or
> >     > 11.1, PGI version 9.0, MKL or ACML, TCGMSG or MPICH with GA), they
> >     ended up
> >     > with the following error message (highlighted in red below) when
> >     doing the
> >     > triplets.  The thing is that all of these errors seem to occur
> >     only for
> >     > open-shell systems.  Our system has 32GB physical memory and 2TB
> >     scratch
> >     > space per node.  The shmmax was set to about 8GB.  We greatly
> >     appreciate it
> >     > if anyone can help us with this problem.  Thanks,
> >     >
> >     > 1PROGRAM * CCSD (Unrestricted open-shell coupled cluster)
> >     Authors: C.
> >     > Hampel, H.-J. Werner, 1991, M. Deegan, P.J. Knowles, 1992
> >     >
> >     >  Convergence thresholds:  THRVAR = 1.00D-10  THRDEN = 1.52D-06
> >     >  CCSD(T)     terms to be evaluated (factor= 1.000)
> >     >
> >     >  Number of core orbitals:          22 (  22 )
> >     >  Number of closed-shell orbitals:  15 (  15 )
> >     >  Number of active  orbitals:        2 (   2 )
> >     >  Number of external orbitals:     402 ( 402 )
> >     >
> >     >  ::: For full I/O caching in triples, increase memory by*********
> >     words
> >     >  ::: to1755.8 Mword
> >     >
> >     >  Number of N-1 electron functions:              32
> >     >  Number of N-2 electron functions:             496
> >     >  Number of singly external CSFs:             12928
> >     >  Number of doubly external CSFs:          61238926
> >     >  Total number of CSFs:                    61251854
> >     >
> >     >  Molecular orbitals read from record     2100.2
> >      Type=RHF/CANONICAL (state
> >     > 1.1) Multipassing necessary in transformation. To avoid, increase
> >     memory by
> >     >     128106603 words
> >     >
> >     >  Integral transformation finished. Total CPU:2901.60 sec, npass=
> >      2  Memory
> >     > used: 683.45 MW Starting RMP2 calculation
> >     >  ITER.      SQ.NORM     CORR.ENERGY   TOTAL ENERGY   ENERGY CHANGE
> >     > DEN1      VAR(S)    VAR(P)  DIIS     TIME 1      1.33546257
> >      -1.16144736
> >     > -4335.92514133    -1.16144736    -0.00431235  0.43D-04  0.21D-02  1
>  1
> >     > 3183.04 2      1.34059958    -1.16610638 -4335.92980035
>  -0.00465902
> >     > -0.00001448  0.13D-05  0.10D-04  2  2  3261.67 3      1.34089992
> >     > -1.16624621 -4335.92994017    -0.00013983    -0.00000007  0.30D-07
> >     > 0.41D-07  3  3  3341.89 4      1.34090974    -1.16624789
> >     -4335.92994185
> >     > -0.00000168     0.00000000  0.55D-09  0.20D-09  4  4  3430.21 5
> >     > 1.34090996    -1.16624793 -4335.92994189    -0.00000004
> 0.00000000
> >     > 0.62D-11  0.18D-11  5  5  3514.72 Norm of t1 vector:
>  0.04235068
> >     > S-energy:    -0.00137917      T1 diagnostic:  0.00032679 Norm of
> >     t2 vector:
> >     >      0.58233700      P-energy:    -1.16486875
> >     >                                          Alpha-Beta:  -0.84631561
> >     >                                          Alpha-Alpha: -0.17609129
> >     >                                          Beta-Beta:   -0.14246185
> >     >  Spin contamination <S**2-Sz**2-Sz>     0.00222417
> >     >   Reference energy                  -4334.763693963800
> >     >   RHF-RMP2 correlation energy          -1.166247926020
> >     >  !RHF-RMP2 energy                   -4335.929941889820
> >     >  Starting UCCSD calculation
> >     >
> >     >  ITER.      SQ.NORM     CORR.ENERGY   TOTAL ENERGY   ENERGY CHANGE
> >     > DEN1      VAR(S)    VAR(P)  DIIS     TIME 1      1.31348357
> >      -1.11372131
> >     > -4335.87741527    -1.11372131    -0.03770484  0.15D-01  0.60D-02  1
>  1
> >     > 6225.50 2      1.34810127    -1.14418474 -4335.90787871
>  -0.03046344
> >     > -0.00519474  0.11D-02  0.19D-02  2  2  8871.00 3      1.37157079
> >     > -1.14982704 -4335.91352101    -0.00564230    -0.00118521  0.88D-03
> >     > 0.26D-03  3  3 11413.76 4      1.38817432    -1.15396848
> >     -4335.91766245
> >     > -0.00414144    -0.00031068  0.24D-03  0.75D-04  4  4 13988.41 5
> >     > 1.40190914    -1.15581273 -4335.91950669    -0.00184424
>  -0.00008174
> >     > 0.75D-04  0.18D-04  5  5 16572.34 6      1.40879522    -1.15635171
> >     > -4335.92004567    -0.00053898    -0.00002803  0.27D-04  0.61D-05  6
>  6
> >     > 19124.55 7      1.41293148    -1.15665580 -4335.92034976
> >      -0.00030409
> >     > -0.00001046  0.11D-04  0.19D-05  6  1 21750.82 8      1.41481085
> >     > -1.15669258 -4335.92038654    -0.00003678    -0.00000373  0.38D-05
> >     > 0.67D-06  6  3 24290.46 9      1.41627012    -1.15678864
> >     -4335.92048261
> >     > -0.00009607    -0.00000101  0.96D-06  0.20D-06  6  2 26889.37 10
> >     > 1.41674986    -1.15681254 -4335.92050650    -0.00002389
>  -0.00000037
> >     > 0.31D-06  0.10D-06  6  5 29475.05 11      1.41709203    -1.15682888
> >     > -4335.92052284    -0.00001634    -0.00000015  0.94D-07  0.59D-07  6
>  4
> >     > 32040.35 12      1.41725234    -1.15683361 -4335.92052757
> >      -0.00000473
> >     > -0.00000006  0.26D-07  0.30D-07  6  6 34610.82 13      1.41731799
> >     > -1.15682847 -4335.92052243     0.00000514    -0.00000003  0.95D-08
> >     > 0.15D-07  6  1 37184.58 14      1.41733825    -1.15682649
> >     -4335.92052045
> >     >  0.00000198    -0.00000001  0.45D-08  0.54D-08  6  3 39758.31 15
> >     > 1.41733803    -1.15682275 -4335.92051671     0.00000374
> 0.00000000
> >     > 0.25D-08  0.18D-08  6  2 42348.32 16      1.41731431    -1.15682119
> >     > -4335.92051515     0.00000156     0.00000000  0.13D-08  0.51D-09  6
>  4
> >     > 44911.04 17      1.41730629    -1.15682024 -4335.92051420
> >     0.00000095
> >     >  0.00000000  0.69D-09  0.88D-10  6  5 47479.18 18      1.41729851
> >     > -1.15682007 -4335.92051403     0.00000017     0.00000000  0.37D-09
> >     > 0.41D-10  6  6 50043.26 19      1.41729471    -1.15681998
> >     -4335.92051394
> >     >  0.00000009     0.00000000  0.23D-09  0.32D-10  6  5 52580.90 20
> >     > 1.41729089    -1.15681991 -4335.92051387     0.00000007
> 0.00000000
> >     > 0.10D-09  0.26D-10  6  1 55151.19 21      1.41729040    -1.15681976
> >     > -4335.92051372     0.00000015     0.00000000  0.37D-10  0.16D-10  6
>  3
> >     > 57716.37 Norm of t1 vector:      0.25244862      S-energy:
> >      -0.00315163
> >     >    T1 diagnostic:  0.02624590 D1 diagnostic:  0.07310115 Norm of
> >     t2 vector:
> >     >      0.59460919      P-energy:    -1.15366813
> >     >                                          Alpha-Beta:  -0.87275562
> >     >                                          Alpha-Alpha: -0.15779991
> >     >                                          Beta-Beta:   -0.12311260
> >     >  Singles amplitudes (print threshold =  0.500E-01):
> >     >          I         SYM. A    A   T(IA) [Alpha-Alpha]
> >     >         15         1         7     -0.05822650
> >     >
> >     >          I         SYM. A    A   T(IA) [Beta-Beta]
> >     >         15         1        18     -0.05411281
> >     >  Doubles amplitudes (print threshold =  0.500E-01):
> >     >          I         J         SYM. A    SYM. B    A         B
> >      T(IJ, AB)
> >     > [Alpha-Beta] 15        15         1         1         7         7
> >     > -0.05298773 Spin contamination <S**2-Sz**2-Sz>     0.03094270
> >     >  ERROR WRITING        32768 WORDS AT OFFSET 2324742792. TO FILE 5
> >     > IMPLEMENTATION=sf   FILE HANDLE= -2997  IERR= -2000 ? Error
> >     >  ? I/O error
> >     >  ? The problem occurs in writew
> >     >
> >     > --
> >     > Shenggang Li
> >     >
> >     > Ours is essentially a tragic age, so we refuse to take it
> >     tragically. -- D.
> >     > H. Lawrence
> >
> >
> >
> >     --
> >     *************************************************************
> >     Jörg Saßmannshausen
> >     Research Fellow
> >     University of Strathclyde
> >     Department of Pure and Applied Chemistry
> >     295 Cathedral St.
> >     Glasgow
> >     G1 1XL
> >
> >     email: jorg.sassmannshausen at strath.ac.uk
> >     <mailto:jorg.sassmannshausen at strath.ac.uk>
> >     web: http://sassy.formativ.net <http://sassy.formativ.net/>
> >
> >     Please avoid sending me Word or PowerPoint attachments.
> >     See http://www.gnu.org/philosophy/no-word-attachments.html
> >     _______________________________________________
> >     Molpro-user mailing list
> >     Molpro-user at molpro.net <mailto:Molpro-user at molpro.net>
> >     http://www.molpro.net/mailman/listinfo/molpro-user
> >
> >
> >
> >
> > --
> > Shenggang Li
> >
> > Ours is essentially a tragic age, so we refuse to take it tragically. --
> > D. H. Lawrence
> >
> >
> > ------------------------------------------------------------------------
> >
> > _______________________________________________
> > Molpro-user mailing list
> > Molpro-user at molpro.net
> > http://www.molpro.net/mailman/listinfo/molpro-user
>



-- 
Shenggang Li

Ours is essentially a tragic age, so we refuse to take it tragically. -- D.
H. Lawrence
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.molpro.net/pipermail/molpro-user/attachments/20090902/ec87afa2/attachment.html>


More information about the Molpro-user mailing list