[molpro-user] molpro's job sudden death

Andy May MayAJ1 at cardiff.ac.uk
Wed Apr 21 15:36:29 BST 2010


Ron,

Here are some observations which may be helpful:

1. Molpro memory specification is for each process, not the total
footprint of the job, so be sure that memory required multiplied by
number of processes is less that the total available memory on the system.

2. Molpro memory specification is in words. The conversion is

1 word = 8 bytes

although to be completely accurate:

1 word = sizeof(double) bytes.

A Molpro input file is case insensitive, and memory specified here is
decimal, so:

memory,100,m or memory,100,M

will both set the requested memory to 100*1000*1000 words.

The command line option -M, is case sensitive, lower case for decimal,
upper case for binary:

-m 1k will request 1000 words of memory
-m 1K will request 1024 words of memory
-m 1m will request 1000000 words of memory
-m 1M will request 1048576 words of memory
-m 1g will request 1000000000 words of memory
-m 1G will request 1073741824 words of memory

Take care when making a conversion, for example:

memory,100,m

is 100,000,000 words, which is 800,000,000 bytes. The conversion to MB
is ambiguous and depends upon how you define a MB (1000^2 or 1024^2),

3. Memory requested is for the main memory stack in Molpro, which we
encourage all developers to use. Inevitably, there are some bits of code
which dynamically assign their own memory, and some static memory.
Although this should be quite small amount, one should not assign
exactly the amount of free memory for the system. (of course the OS will
not like this either).

Best wishes,

Andy

On 20/04/10 17:55, Ronald Kasl wrote:
> Thanks, .. the computational chemists told me that they are aware of
> that and that they tried different amount of memory, but the output was
> the same.
> 
> ... is there any chance that you can patch the code that it shows by how
> much the memory needs to be increased --- this is what we get now .. the
> guys said that changing one line in the code would fix it , they don't
> want to bother you with it, but I thought that you may want to check on
> that
> 
> ** this is what it shows in the output file (see the attachment for more)
> ......
> 
>  For full I/O caching in triples, increase memory by********* words
> to****** Mword
> 
> **
> 
> Thanks,
> Ron
> 
> 
> 
> Manhui Wang wrote:
>> Please be aware that the memory directive in the input is in Word (not
>> Byte) per process.
>> For examples the line in your input
>> memory,500,M
>> means you might request 500 MWord of memory per process. When you run it
>> with 8 processes, it might request 500 * 8 *8 = 32GB of  memory. If
>> memory allocation in parallel exceeds the total limit, please try
>> reducing the memory or reducing the number of processes.
>>
>> Best wishes,
>> Manhui
>>
>> psc wrote:
>>   
>>> Good morning, by any chance does anybody have any experiences with
>>> sudden death of molpro? On our place this happen when runs on 8 cores in
>>> machine with 2*4 cores machine? It runs fine for awhile, but then
>>> suddenly dies ... before the job dies, the machine still have enough
>>> memory and the disk is only 32% filled.  Do you have any clues of what
>>> is happening? How do you troubleshoot such problems with molpro? The
>>> computational chemist tried to run same job on  4 cores and the job runs
>>> just fine.
>>>
>>> Thanks!
>>>
>>> This is the last portion of the output file:
>>>
>>>  DF-MP2-F12 correlation energies:
>>>  --------------------------------
>>>  Approx.                                    Singlet             Triplet
>>> Ecorr            Total Energy
>>>  DF-MP2                                -2.105468770835    
>>> -1.481892024291 -3.587360795125  -1241.433614075391
>>>  DF-MP2-F12/3*C(DX,FIX)                -3.180235173011    
>>> -1.762556768679 -4.942791941690  -1242.789045221956
>>>  DF-MP2-F12/3*C(FIX)                   -3.079029486269    
>>> -1.791231138096 -4.870260624365  -1242.716513904631
>>>  DF-MP2-F12/3C(FIX)                    -3.076495219986    
>>> -1.793891105189 -4.870386325175  -1242.716639605441
>>>
>>>  SCS-DF-MP2 energies (F_SING= 1.20000  F_TRIP= 0.62222  F_PARALLEL=
>>> 0.33333):
>>>
>>> ----------------------------------------------------------------------------
>>>
>>>  SCS-DF-MP2                            -3.448628673449  -1241.294881953715
>>>  SCS-DF-MP2-F12/3*C(DX,FIX)            -4.912984197013  -1242.759237477279
>>>  SCS-DF-MP2-F12/3*C(FIX)               -4.809379202782  -1242.655632483048
>>>  SCS-DF-MP2-F12/3C(FIX)                -4.807993173879  -1242.654246454144
>>>
>>>  Symmetry transformation completed.
>>>
>>>  Number of N-1 electron functions:              63
>>>  Number of N-2 electron functions:            2016
>>>  Number of singly external CSFs:             19467
>>>  Number of doubly external CSFs:         189491778
>>>  Total number of CSFs:                   189511246
>>>
>>>  Pair and operator lists are different
>>>
>>>  Length of J-op  integral file:             163.14 GB
>>>  Length of K-op  integral file:             113.78 GB
>>>  Length of 3-ext integral record:             0.00 MB
>>>
>>>  Memory could be reduced to2370.6 Mword without degradation in triples
>>>
>>>
>>> forrtl: error (69): process interrupted (SIGINT)
>>> forrtl: error (69): process interrupted (SIGINT)
>>> forrtl: error (69): process interrupted (SIGINT)
>>> forrtl: error (69): process interrupted (SIGINT)
>>> forrtl: error (69): process interrupted (SIGINT)
>>> forrtl: error (69): process interrupted (SIGINT)
>>> forrtl: error (69): process interrupted (SIGINT)
>>> Image              PC                Routine            Line        Source
>>> molprop_2009_1_Li  000000000262888F  Unknown               Unknown  Unknown
>>> molprop_2009_1_Li  00000000025FFB96  Unknown               Unknown  Unknown
>>> molprop_2009_1_Li  000000000219B509  Unknown               Unknown  Unknown
>>> molprop_2009_1_Li  000000000219C545  Unknown               Unknown  Unknown
>>> molprop_2009_1_Li  000000000219F48D  Unknown               Unknown  Unknown
>>> molprop_2009_1_Li  000000000171D1F7  Unknown               Unknown  Unknown
>>> molprop_2009_1_Li  00000000017184C5  Unknown               Unknown  Unknown
>>> molprop_2009_1_Li  00000000004BAD99  Unknown               Unknown  Unknown
>>> molprop_2009_1_Li  00000000004B5AE5  Unknown               Unknown  Unknown
>>> molprop_2009_1_Li  000000000043DD2C  Unknown               Unknown  Unknown
>>> libc.so.6          00007F91CAF48ABD  Unknown               Unknown  Unknown
>>> molprop_2009_1_Li  000000000043DC29  Unknown               Unknown  Unknown
>>> [0]0:Return code = 0, signaled with Killed
>>> [0]1:Return code = 1
>>> [0]2:Return code = 1
>>> [0]3:Return code = 1
>>> [0]4:Return code = 1
>>> [0]5:Return code = 1
>>> [0]6:Return code = 1
>>> [0]7:Return code = 1
>>>
>>> _______________________________________________
>>> Molpro-user mailing list
>>> Molpro-user at molpro.net
>>> http://www.molpro.net/mailman/listinfo/molpro-user
>>>     
>>
>>   
>>
>>
>> _______________________________________________
>> Molpro-user mailing list
>> Molpro-user at molpro.net
>> http://www.molpro.net/mailman/listinfo/molpro-user



More information about the Molpro-user mailing list