[molpro-user] problems running molpro-mpp-2010.1-24.Linux_x86_64 on AMD-based cluster

Anatoliy Volkov Anatoliy.Volkov at mtsu.edu
Tue Jun 19 18:16:43 BST 2012


Greetings,

I seem to have hit a wall trying to get molpro-mpp-2010.1-24.Linux_x86_64 to run
on my AMD-based cluster (16 nodes, 6-core Phenom II X6 1090T or FX-6100 cpus, and 
16 GB RAM per node,  OpenSUSE 12.1 x86-64, kernel 3.1.10-1.9-desktop),  while there are 
absolutely no issues running  the same version of Molpro on my old Intel-based cluster 
(dual-socket quad-core Xeon E5230 cpus, 16 GB RAM per node, OpenSUSE 11.4 x86-64, 
kernel 2.6.37.6-0.11-desktop)

On the AMD cluster, when Molpro starts to run on the master nodes, it tries of allocate a lot of memory,
and then dies. I have taken a couple of snapshots of 'top' (see attached top.log file)

At first it tries to allocate 9GB, then 19 GB, then 25 GB etc., and then it dies with the
following error in TORQUE log file:

Running Molpro
tmp = /home/avolkov/pdir//usr/local/molpro/molprop_2010_1_Linux_x86_64_i8/bin/molpro.exe.p

 Creating: host=viz01, user=avolkov,
           file=/usr/local/molpro/molprop_2010_1_Linux_x86_64_i8/bin/molpro.exe, port=34803

 60: ListenAndAccept: timeout waiting for connection 0 (0).

0.008u 0.133s 3:01.86 0.0%	0+0k 9344+40io 118pf+0w
 
I am not sure I understand what is happening here. My cluster uses passwordless rsh and 
I have not noticed any issues with communication between nodes.. At least my own code that
I compile using rsh-enabled OpenMPI runs just fine on the cluster. Could it be that this version of
Molpro tries ti use ssh? But then I do not understand why it works on my Intel cluster where only
rsh is available...

On both clusters Molpro has been installed the same way (/usr/local/molpro, NFS mounted on
all nodes) and pretty much the same TORQUE script is used.

On both clusters, I start Molpro using the following command in my TORQUE script:

time /usr/local/molpro/molpro -m 64M -o $ofile -d $SCR -N $TASKLIST $ifile

where, $TASKLIST is defined by the TORQUE script, and in case of the latest failed job 
on the AMD cluster, had the following value:
TASKLIST = viz01:6,viz02:6,viz03:6,viz04:6,viz05:6,viz06:6,viz07:6,viz08:6,viz09:6,viz10:6

In the temp directory, file molpro_options.31159 contained:
 -m 64M -o test.out -d /tmp/16.wizard.cs.mtsu.edu test.com
while file procgrp.31159 was as follows:
avolkov viz01 1 /usr/local/molpro/molprop_2010_1_Linux_x86_64_i8/bin/molpro.exe /data1/avolkov/benchmarks/molpro/bul
......
avolkov viz02 1 /usr/local/molpro/molprop_2010_1_Linux_x86_64_i8/bin/molpro.exe /data1/avolkov/benchmarks/molpro/bul
.....
.....
avolkov viz10 1 /usr/local/molpro/molprop_2010_1_Linux_x86_64_i8/bin/molpro.exe /data1/avolkov/benchmarks/molpro/bul

BTW, test.out file is never created....

Contents of test.com file:
! $Revision: 2006.3 $ 
***,bullvalene                  !A title
memory,64,M                     ! 1 MW = 8 MB
basis=cc-pVTZ
geomtyp=xyz
geometry={
20           ! number of atoms
this is where you put your title
 C          1.36577619     -0.62495122     -0.63870960
 C          0.20245537     -1.27584792     -1.26208804
 C         -1.09275642     -1.01415419     -1.01302123
.........
}
ks,b3lyp

What am I doing wrong here ?

Thank you in advance for your help!

Best Regards,
Anatoliy
---------------------------
Anatoliy Volkov, Ph.D.
Associate Professor
Department of Chemistry
Middle Tennessee State University
-------------- next part --------------
A non-text attachment was scrubbed...
Name: top.log
Type: application/octet-stream
Size: 9182 bytes
Desc: top.log
URL: <http://www.molpro.net/pipermail/molpro-user/attachments/20120619/fc40f20b/attachment.obj>


More information about the Molpro-user mailing list