[molpro-user] problems running molpro-mpp-2010.1-24.Linux_x86_64 on AMD-based cluster

Andy May MayAJ1 at cardiff.ac.uk
Wed Jun 20 10:21:45 BST 2012


Anatoliy,

Yes, there appears to be some problem when running the binaries on 
openSUSE 12.1. It is not simply a binary incompatibility problem; I see 
the same issue building from source code on 12.1. Apparently, building 
with pure TCGMSG produces an executable which crashes upon the first 
call to a global arrays routine. We have this bug reported:

https://www.molpro.net/bugzilla/show_bug.cgi?id=3712

and are looking into a fix.

I see that you have access to the source code, so you can easily build a 
TCGMSG-MPI or MPI2 version from source which should work fine. Please 
let us know if you have any problems building.

Just for information, rsh should be the default for the binaries, but it 
can be changed by setting TCGRSH environment variable (or passing 
--tcgssh option to bin/molpro shell script).

Best wishes,

Andy

On 19/06/12 18:16, Anatoliy Volkov wrote:
> Greetings,
>
> I seem to have hit a wall trying to get molpro-mpp-2010.1-24.Linux_x86_64 to run
> on my AMD-based cluster (16 nodes, 6-core Phenom II X6 1090T or FX-6100 cpus, and
> 16 GB RAM per node,  OpenSUSE 12.1 x86-64, kernel 3.1.10-1.9-desktop),  while there are
> absolutely no issues running  the same version of Molpro on my old Intel-based cluster
> (dual-socket quad-core Xeon E5230 cpus, 16 GB RAM per node, OpenSUSE 11.4 x86-64,
> kernel 2.6.37.6-0.11-desktop)
>
> On the AMD cluster, when Molpro starts to run on the master nodes, it tries of allocate a lot of memory,
> and then dies. I have taken a couple of snapshots of 'top' (see attached top.log file)
>
> At first it tries to allocate 9GB, then 19 GB, then 25 GB etc., and then it dies with the
> following error in TORQUE log file:
>
> Running Molpro
> tmp = /home/avolkov/pdir//usr/local/molpro/molprop_2010_1_Linux_x86_64_i8/bin/molpro.exe.p
>
>   Creating: host=viz01, user=avolkov,
>             file=/usr/local/molpro/molprop_2010_1_Linux_x86_64_i8/bin/molpro.exe, port=34803
>
>   60: ListenAndAccept: timeout waiting for connection 0 (0).
>
> 0.008u 0.133s 3:01.86 0.0%	0+0k 9344+40io 118pf+0w
>
> I am not sure I understand what is happening here. My cluster uses passwordless rsh and
> I have not noticed any issues with communication between nodes.. At least my own code that
> I compile using rsh-enabled OpenMPI runs just fine on the cluster. Could it be that this version of
> Molpro tries ti use ssh? But then I do not understand why it works on my Intel cluster where only
> rsh is available...
>
> On both clusters Molpro has been installed the same way (/usr/local/molpro, NFS mounted on
> all nodes) and pretty much the same TORQUE script is used.
>
> On both clusters, I start Molpro using the following command in my TORQUE script:
>
> time /usr/local/molpro/molpro -m 64M -o $ofile -d $SCR -N $TASKLIST $ifile
>
> where, $TASKLIST is defined by the TORQUE script, and in case of the latest failed job
> on the AMD cluster, had the following value:
> TASKLIST = viz01:6,viz02:6,viz03:6,viz04:6,viz05:6,viz06:6,viz07:6,viz08:6,viz09:6,viz10:6
>
> In the temp directory, file molpro_options.31159 contained:
>   -m 64M -o test.out -d /tmp/16.wizard.cs.mtsu.edu test.com
> while file procgrp.31159 was as follows:
> avolkov viz01 1 /usr/local/molpro/molprop_2010_1_Linux_x86_64_i8/bin/molpro.exe /data1/avolkov/benchmarks/molpro/bul
> ......
> avolkov viz02 1 /usr/local/molpro/molprop_2010_1_Linux_x86_64_i8/bin/molpro.exe /data1/avolkov/benchmarks/molpro/bul
> .....
> .....
> avolkov viz10 1 /usr/local/molpro/molprop_2010_1_Linux_x86_64_i8/bin/molpro.exe /data1/avolkov/benchmarks/molpro/bul
>
> BTW, test.out file is never created....
>
> Contents of test.com file:
> ! $Revision: 2006.3 $
> ***,bullvalene                  !A title
> memory,64,M                     ! 1 MW = 8 MB
> basis=cc-pVTZ
> geomtyp=xyz
> geometry={
> 20           ! number of atoms
> this is where you put your title
>   C          1.36577619     -0.62495122     -0.63870960
>   C          0.20245537     -1.27584792     -1.26208804
>   C         -1.09275642     -1.01415419     -1.01302123
> .........
> }
> ks,b3lyp
>
> What am I doing wrong here ?
>
> Thank you in advance for your help!
>
> Best Regards,
> Anatoliy
> ---------------------------
> Anatoliy Volkov, Ph.D.
> Associate Professor
> Department of Chemistry
> Middle Tennessee State University
>
>
>
> _______________________________________________
> Molpro-user mailing list
> Molpro-user at molpro.net
> http://www.molpro.net/mailman/listinfo/molpro-user
>




More information about the Molpro-user mailing list