[molpro-user] location of scratch files in parallel runs

Dr. Anatoliy Volkov avolkov at mtsu.edu
Wed Jun 17 19:00:46 BST 2009


Greetings,

I have a question about location of scratch files when running a parallel
(64 bit mpp) version of Molpro 2008.1  on a cluster of smp nodes.

In my PBS/TORQUE script I create a temporary directory on each
of the nodes for Molpro to store scratch files:

set SCR = "/scratch/$PBS_JOBID"
foreach node ($NODES)
   rsh $node mkdir -p $SCR
end

where $NODES is a list of nodes on which the job will be executed
and $PBS_JOBID is a unique ID assigned to a job by PBS/TORQUE.

For example, in a job I am running right now PBS_JOBID = 
7840.voltron.mtsu.edu,
so scratch directory /scratch/7840.voltron.mtsu.edu is created on each 
of the nodes.

I then invoke molpro script and let it deal with $PBS_NODEFILE:
/usr/local/molpro/molpro -o $ofile -d $SCR $ifile

where $ifile - Molpro input file name and $ofile - Molpro output file name

I use -d option to tell Molpro to use newly created temp directories for 
scratch files,
but I have decided not to use -N and -n options as molpro script seems 
to be able to extract
all the necessary information from $PBS_NODEFILE by itself.

In Molpro output file I get:

-----------------------------------------------------------------
 Primary working directories    : /scratch/7840.voltron.mtsu.edu
 Secondary working directories  : /scratch/7840.voltron.mtsu.edu
 Wavefunction directory         : /home/avolkov/wfu/
 Main file repository           : /scratch/7840.voltron.mtsu.edu/
 
 cpu       : Intel(R) Xeon(R) CPU           E5320  @ 1.86GHz 1862.023 MHz
 FC        : /opt/intel/fce/10.1.008/bin/ifort
 FCVERSION : 10.1
 BLASLIB   : -L/opt/intel/mkl/9.1/lib/em64t -lmkl_em64t -lguide 
-lpthread -openmp
 id        : mtsu

 MPP nodes nproc
 tron09       8
 tron08       8
 tron07       8
 tron06       8
 ga_uses_ma=false, calling ma_init with nominal heap.
 GA-space will be limited to  64.0 MW (determined by -G option)

 MPP tuning parameters: Latency=     0 Microseconds,   Broadcast 
speed=    0 MB/sec
 default implementation of scratch files=ga 
-----------------------------------------------------------------

It does seem like the program reads names of scratch directories correctly.
In fact, I can see that Molrpo created some fortran temporary
files in /scratch/7840.voltron.mtsu.edu/ on each of the nodes but these 
files
to do not grow in size as calculation proceeds:

avolkov at tron09:/scratch/7840.voltron.mtsu.edu> ls -ltrh
total 164K
-rw-r--r-- 1 avolkov chem 2.9K 2009-06-17 08:30 procgrp.30568
-rw------- 1 avolkov chem   30 2009-06-17 08:31 fortZgGExZ
-rw------- 1 avolkov chem   30 2009-06-17 08:31 fortvM4eyZ
-rw------- 1 avolkov chem 2.9K 2009-06-17 08:31 fortOUJ0Dk
-rw------- 1 avolkov chem   30 2009-06-17 08:31 fortoo0HxZ
-rw------- 1 avolkov chem 2.6K 2009-06-17 08:31 fortNfhGvX
-rw------- 1 avolkov chem   30 2009-06-17 08:31 fortkIsLxZ
-rw------- 1 avolkov chem   30 2009-06-17 08:31 fortk5eiyZ
-rw------- 1 avolkov chem   30 2009-06-17 08:31 fortHmYPxZ
-rw------- 1 avolkov chem   30 2009-06-17 08:31 fortehGExZ
-rw------- 1 avolkov chem    0 2009-06-17 08:31 fort8IuJm5
-rw------- 1 avolkov chem   73 2009-06-17 08:31 fortWTWm0H
-rw------- 1 avolkov chem 116K 2009-06-17 12:03 forth156Is

avolkov at tron08:/scratch/7840.voltron.mtsu.edu> ls -ltrh
total 32K
-rw------- 1 avolkov chem 30 2009-06-17 08:31 fortVsDUGy
-rw------- 1 avolkov chem 30 2009-06-17 08:31 forttBroHy
-rw------- 1 avolkov chem 30 2009-06-17 08:31 fortrC8eHy
-rw------- 1 avolkov chem 30 2009-06-17 08:31 fortMBroHy
-rw------- 1 avolkov chem 30 2009-06-17 08:31 fortLYrmHy
-rw------- 1 avolkov chem 30 2009-06-17 08:31 fortHXwJGy
-rw------- 1 avolkov chem 30 2009-06-17 08:31 fortDaB5Gy
-rw------- 1 avolkov chem 30 2009-06-17 08:31 fortB6XqHy

etc

These files seem to be too small for integral files that Molpro prints out:

-----------------------------------------------------------------
 Contracted 2-electron integrals neglected if value below      1.0D-11
 AO integral compression algorithm  1   Integral accuracy      1.0D-11

     11909.464 MB (compressed) written to integral file ( 18.7%)

     Node minimum: 336.331 MB, node maximum: 403.177 MB
 

 NUMBER OF SORTED TWO-ELECTRON INTEGRALS:  827943072.     BUFFER 
LENGTH:  32768
 NUMBER OF SEGMENTS:  58  SEGMENT LENGTH:   14581568      RECORD LENGTH: 
131072

 Memory used in sort:      14.75 MW

 SORT1 READ  7975193852. AND WROTE   104517941. INTEGRALS IN   1223 
RECORDS. CPU TIME:   187.08 SEC, REAL TIME:   630.37 SEC
 SORT2 READ  3323600146. AND WROTE 26491761471. INTEGRALS IN  49216 
RECORDS. CPU TIME:    19.95 SEC, REAL TIME:   117.89 SEC

 Node minimum:   827756054.  Node maximum:   827979042. integrals

 OPERATOR DM      FOR CENTER  0  COORDINATES:    0.000000    0.000000    
0.000000


 **********************************************************************************************************************************
 DATASETS  * FILE   NREC   LENGTH (MB)   RECORD NAMES
              1      18       20.03       500      610      700      
900      950      970     1000      129      960     1100  
                                          VAR    BASINP    GEOM    
SYMINP    ZMAT    AOBASIS   BASIS     P2S    ABASIS      S
                                         1400     1410     1200     
1210     1080     1600     1650     1700  
                                           T        V       H0       
H01     AOSYM     SMH    MOLCAS    OPER  
 
 PROGRAMS   *        TOTAL       INT
 CPU TIMES  *       253.96    253.78
 REAL TIME  *       809.18 SEC
 DISK USED  *        53.14 GB     
 GA USED    *         0.11 MB       (max)       0.00 MB       (current)
 **********************************************************************************************************************************


As I understand from the output, the disk usage should be on the order 
of several gigabytes,
while /scratch/7840.voltron.mtsu.edu directory on each of the nodes 
contains less than a megabyte of data.

Does it really mean that all these scratch files are allocated in GA, as 
given at the beginning of the
Molpro output file ? , i.e.
-----------------------------------------------------------------
default implementation of scratch files=ga 
-----------------------------------------------------------------

It does seem like so, because each process (when using 'top' command) 
shows using about 900MB
of virtual memory, even though only 160MB are resident:

for example, on master node tron09:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
30899 avolkov   25   0  888m 159m 7236 R  101  1.0 215:09.04 molprop_2008_1_
30900 avolkov   25   0  888m 159m 7240 R  100  1.0 215:50.31 molprop_2008_1_
30901 avolkov   25   0  888m 159m 7172 R  100  1.0 215:57.17 molprop_2008_1_
30903 avolkov   25   0  888m 159m 7096 R  100  1.0 215:10.51 molprop_2008_1_
30914 avolkov   25   0  888m 159m 7068 R  100  1.0 214:08.87 molprop_2008_1_
30904 avolkov   25   0  888m 159m 7092 R   99  1.0 215:11.66 molprop_2008_1_
30912 avolkov   25   0  888m 159m 7088 R   80  1.0 215:13.30 molprop_2008_1_
30898 avolkov   15   0  892m 160m 7504 R   37  1.0 118:15.30 molprop_2008_1_

and similar on all slave nodes....

However, this seems to contradict what is given in Molpro output:

 DISK USED  *        53.14 GB     
 GA USED    *         0.11 MB       (max)       0.00 MB       (current)

which suggests that it is disk space that is used, not GA.

Am I doing something wrong when submitting the job, or do I misinterpret 
Molpro output ?

I do want all temp files to be written to local (for each of the nodes) 
/scratch diskspace,
or keep them in GA.

Any help would be very much appreciated.

Thank you,
Anatoliy

-- 
Anatoliy Volkov, Ph.D.
Associate Professor
Department of Chemistry
Middle Tennessee State University
239 Davis Science Bldg.
MTSU Box 68
Murfreesboro, TN 37132

E-mail: avolkov at mtsu.edu
  Fax: (615) 898-5182
Phone: (615) 494-8655





More information about the Molpro-user mailing list